This four-day hands-on training course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark (including Spark Streaming and Spark SQL), Flume, Kafka, and Sqoop, this training course is the best preparation for the real-world challenges faced by Hadoop developers.
This course is designed for developers and engineers who have programming experience. Apache Spark examples and hands-on exercises are presented in Scala and Python, so the ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful; prior knowledge of Hadoop is not required.
Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful; prior knowledge of Hadoop is not required.
Introduction to Apache Hadoop and the Hadoop Ecosystem
• Apache Hadoop Overview
• Data Storage and Ingest
• Data Processing
• Data Analysis and Exploration
• Other Ecosystem Tools
• Introduction to the Hands-On Exercises
Apache Hadoop File Storage
• Problems with Traditional
• HDFS Architecture
• Using HDFS
• Apache Hadoop File Formats
Data Processing on an Apache Hadoop Cluster
• YARN Architecture
• Working With YARN
Importing Relational Data with Apache Sqoop
• Apache Sqoop Overview
• Importing Data
• Importing File Options
• Exporting Data
Apache Spark Basics
• What is Apache Spark?
• Using the Spark Shell
• RDDs (Resilient Distributed Datasets)
• Functional Programming in Spark
Working with RDDs
• Creating RDDs
• Other General RDD Operations
Aggregating Data with Pair RDDs
• Key-Value Pair RDDs
• Other Pair RDD Operations
Writing and Running Apache Spark Applications
• Spark Applications vs. Spark Shell
• Creating the SparkContext
• Building a Spark Application
(Scala and Java)
• Running a Spark Application
• The Spark Application Web UI
Configuring Apache Spark Applications
• Configuring Spark Properties
Parallel Processing in Apache Spark
• Review: Apache Spark on a Cluster
• RDD Partitions
• Partitioning of File-Based RDDs
• HDFS and Data Locality
• Executing Parallel Operations
• Stages and Tasks
• RDD Lineage
• RDD Persistence Overview
• Distributed Persistence
Common Patterns in Apache Spark Data Processing
• Common Apache Spark Use Cases
• Iterative Algorithms in Apache Spark
• Machine Learning
• Example: k-means
DataFrames and Spark SQL
• Apache Spark SQL and the SQL Context
• Creating DataFrames
• Transforming and Querying DataFrames
• Saving DataFrames
• DataFrames and RDDs
• Comparing Apache Spark SQL, Impala, and Hive-on-Spark
• Apache Spark SQL in Spark 2.x
Message Processing with Apache Kafka
• What is Apache Kafka?
• Apache Kafka Overview
• Scaling Apache Kafka
• Apache Kafka Cluster Architecture
• Apache Kafka Command Line Tools
Capturing Data with Apache Flume
• What is Apache Flume?
• Basic Flume Architecture
• Flume Sources
• Flume Sinks
• Flume Channels
• Flume Configuration
Integrating Apache Flume and Apache Kafka
• Use Cases
Apache Spark Streaming: Introduction to DStreams
• Apache Spark Streaming Overview
• Example: Streaming Request Count
• Developing Streaming Applications
Apache Spark Streaming: Processing Multiple Batches
• Multi-Batch Operations
• Time Slicing
• State Operations
• Sliding Window Operations
Apache Spark Streaming: Data Sources
• Streaming Data Source Overview
• Apache Flume and Apache Kafka Data Sources
• Example: Using a Kafka Direct Data Source
Engineer - IT Infrastructure| GlobalLogic
It was wonderful experience to attend the Ethical Hacking training session. The way Surya had provided the session is superb. Being an IT person I feel that we should have good knowledge of
Sekhar Babu Tatavarti, PMP
Engineering Services | HSBC Technology
On behalf of HSBC Training team, I would like to thank you, your organization and the Trainer, Amrita, for the excellent training that is provided to our Campus Trainee Software Engineers in Java technology.
Really appreciate your efforts behind making this training a big success , especially considering the diverse levels of participant expertise on the subject.The participants have provided great feedback around the
Learning Lead | terprise Services Delivery| ANZ
Let me start by thanking you for organizing the Excel Session at such a short notice. Over and above that, your session was better received by the audience as compared to other vendors
Director – Information and Communication Technology | Bioclinica Safety and Regulatory Solutions
Thanks for conducting the ITIL session for our team and it has received excellent rating. The entire team has given thumbs
Imroze Alam PRINCE2®
L&D Regional Delivery Specialist | GFHR GO New Delhi
All requests shared with RPS team were duly accepted and delivered on the requested dates without any fails, Venkatesh always made sure that the required training gadgetry is reached at venue well in advance and with a pool of
Yes, we do offer weekend classes for professionals in group or 1-to-1 Training depending upon the technology.
The administrative and sales staff works on weekdays (Monday - Friday). System Admins and Operation team are available on all days.
Yes, after you have paid the booking amount (which will be non–refundable in this case). Booking amount depends on the technology selected.
Training timings are from 9 am to 5 pm.
You can send the deposit by any of the following methods:-
Overseas credit card payments through PayPal involve a mark-up of up to 4% as surcharge.
We can provide customized 1-to-1 training for a technology as per your requirement.
Most exams can be booked once you are on the course (e.g. Microsoft, ITIL, VEEAM, EC-Council). Red Hat and some other exams have to be booked in advance.
Our training centers are available in Bangalore, Chennai, Hyderabad, Pune and Delhi.
We do not have facility to pay in installments
If the course fee has been paid for and RPS cancels the Course, a refund will be provided, else the courses are non-refundable.
We do not provide loan facility.