GJIT Solution Pvt Ltd offers development and training on Android, IOS, .NET, JAVA, CRM and Hadoop bigdata

Course Duration: 50 days

Course Fees:

Introduction to Hadoop

Introduction to Big Data

Introduction to Hadoop

The Hadoop Distributed File System (HDFS)

Map Reduce

Functional Programming Basics.
Map and Reduce Basics
How Map Reduce Works
Anatomy of a Map Reduce Job Run
Legacy Architecture ->Job Submission, Job Initialization, Task Assignment, Task Execution, Progress and Status Updates
Job Completion, Failures
Shuffling and Sorting
Splits, Record reader, Partition, Types of partitions & Combiner
Optimization Techniques -> Speculative Execution, JVM Reuse and No. Slots.
Types of Schedulers and Counters.
Comparisons between Old and New API at code and Architecture Level.
Getting the data from RDBMS into HDFS using Custom data types.
Distributed Cache and Hadoop Streaming (Python, Ruby and R).
YARN.
Sequential Files and Map Files.
Enabling Compression Codec’s.
Map side Join with distributed Cache.
Types of I/O Formats: Multiple outputs, NLINEinputformat.
Handling small files using CombineFileInputFormat.

Map/Reduce Programming – Java Programming

NOSQL

HBase

HBase Installation
HBase concepts
HBase Data Model and Comparison between RDBMS and NOSQL.
Master & Region Servers.
HBase Operations (DDL and DML) through Shell and Programming and HBase Architecture.
Catalog Tables.
Block Cache and sharding.
SPLITS.
DATA Modeling (Sequential, Salted, Promoted and Random Keys).
JAVA API’s and Rest Interface.
Client Side Buffering and Process 1 million records using Client side Buffering.
HBASE Counters.
Enabling Replication and HBASE RAW Scans.
HBASE Filters.
Bulk Loading and Coprocessors (Endpoints and Observers with programs).
Real world use case consisting of HDFS,MR and HBASE.

Hive

Installation
Introduction and Architecture.
Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)
Meta store
Hive QL
OLTP vs. OLAP
Working with Tables.
Primitive data types and complex data types.
Working with Partitions.
User Defined Functions
Hive Bucketed Tables and Sampling.
External partitioned tables, Map the data to the partition in the table, Writing the output of one query to another table, Multiple inserts
Dynamic Partition
Differences between ORDER BY, DISTRIBUTE BY and SORT BY.
Bucketing and Sorted Bucketing with Dynamic partition.
RC File.
INDEXES and VIEWS.
MAPSIDE JOINS.
Compression on hive tables and Migrating Hive tables.
Dynamic substation of Hive and Different ways of running Hive
How to enable Update in HIVE.
Log Analysis on Hive.
Access HBASE tables using Hive.
Hands on Exercises

Pig

SQOOP

Installation
Import Data.(Full table, Only Subset, Target Directory, protecting Password, file format other than CSV,Compressing,Control Parallelism, All tables Import)
Incremental Import(Import only New data, Last Imported data, storing Password in Metastore, Sharing Metastore between Sqoop Clients)
Free Form Query Import
Export data to RDBMS,HIVE and HBASE
Hands on Exercises.

HCATALOG.

FLUME

Installation
Introduction to Flume
Flume Agents: Sources, Channels and Sinks
Log User information using Java program in to HDFS using LOG4J and Avro Source
Log User information using Java program in to HDFS using Tail Source
Log User information using Java program in to HBASE using LOG4J and Avro Source
Log User information using Java program in to HBASE using Tail Source
Flume Commands
Use case of Flume: Flume the data from twitter in to HDFS and HBASE. Do some analysis using HIVE and PIG

More Ecosystems

Oozie

Workflow (Action, Start, Action, End, Kill, Join and Fork), Schedulers, Coordinators and Bundles.
Workflow to show how to schedule Sqoop Job, Hive, MR and PIG.
Real world Use case which will find the top websites used by users of certain ages and will be scheduled to run for every one hour.
Zoo Keeper
HBASE Integration with HIVE and PIG.
Phoenix
Proof of concept (POC).

SPARK