Index to learn BigData Hadoop Fremework





Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users.It is licensed under the Apache License 2.0.
The Apache Hadoop framework is composed of the following modules:
  • Hadoop Common – contains libraries and utilities needed by other Hadoop modules
  • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.
  • Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.
  • Hadoop MapReduce – a programming model for large scale data processing.
Hadoop HDFS
What is HDFS(Hadoop Distributed File System)? HDFS Architecture, File System, Importing and Exporting Data, listing Files, Creating Directory, Removing Directory, plugins for eclipse, configuration on IDE.

MapReduce
Introduction to MapReduce? Relate MapReduce to Cloud computing, Mapper works, Reducer works,'MapReduce' problems-solutions, Write test MapReduce program, Common MapReduce Algorithms, Sorting and Searching, Indexing, Deeper knowledge of Hadoop API, Using Combiners, Reducing Inter mediate Data with Combiners, Writing Partitioners for Better Load Balancing, Directly Accessing HDFS, Hands-On Exercise.

 

PIG
Introduction to Pig, Where do they fit in? Getting Started with Pig Development, Loading and displaying data, Basic data filters, Pig Schemas, Do some Exercise, PigLatin in-depth, Pig Datatypes, More Advanced Dataset Filtering, Pig Expressions and Functions, Grouping and Sorting Data, Hands-On Exercise, Joining Multiple Datasets,  Validating Datasets, Storing Data, User-Defined Functions, Using functions in Pig, Hands-On Exercise


HIVE 
Introduction to Hive, Hive Architecture, Hive interfaces, Hive architecture, The Hive CLI, Getting data into Hive, Creating tables, Data types, Load data, SerDe, External tables, HiveQL, SQL vs. HiveQL, SELECT/ GROUP BY, Functions/Subqueries, Custom map/reduce scripts,  Joins/Inserting, Hands-on Exercise: Writing queries in HiveQL, Partitioning and Bucketing, Creating partitions, Loading data into partitions, Bucketing, Sampling, Hands-on Exercise: Using partitioning and bucketing, Best Practices for Hive, Configuring Hive, Handling data in Hive, Hands-on Exercise: loading data into Hive,
Importing and Exporting Data from MySQL.

HBASE
What is HBase?(NoSql DataBase), Schema Modeling, The HBase Shell, The HBase Architecture, HBase Java APIs, HBase Data creation using Java Client Programs, Zookeeper.
SQOOP
Sqoop Overview, Installation, Imports and Exports,
Importing and Exporting Data Between HDFS and RDBMS(MySQL)

FLUME
Flume Overview, Installation, Import and Export data, Import Streaming Data.

MAHOUT
What is Mahout? What is Machine Learning? Machine Learning Algorithms.
Recommender, Clustering, Classification. User Based Recommender, Item Based Recommender, Hands-on Exercise, Predictive Learning methods.

Comments

Popular posts from this blog

Setup Nginx as a Reverse Proxy for Thingsboard running on different port/server

How to auto re-launch a YARN Application Master on a failure.

Read JSON File in Cassandra