Cloudera Developer Training for Spark and Hadoop I (CDTSH1)

Course Content

Learn how to import data into your Apache Hadoop cluster and process it with Spark, Hive, Flume, Sqoop, Impala, and other Hadoop ecosystem tools.

This four-day hands-on training course delivers the key concepts and expertise you need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark, Hive, Flume, Sqoop, and Impala, this training course is the best preparation for the real-world challenges faced by Hadoop developers. You will learn to identify which tool is the right one to use in a given situation, and will gain hands-on experience in developing using those tools.

Course Objectives

Through instructor-led discussion and interactive, hands-on exercises, you will learn Apache Spark and how it integrates with the entire Hadoop ecosystem, learning:

  • How data is distributed, stored, and processed in a Hadoop cluster
  • How to use Sqoop and Flume to ingest data
  • How to process distributed data with Apache Spark
  • How to model structured data as tables in Impala and Hive
  • How to choose the best data storage format for different data usage patterns
  • Best practices for data storage

Read more