Download Now

O'Reilly - Hadoop Fundamentals for Data Scientists
O'Reilly - Hadoop Fundamentals for Data Scientists
Size: 7.72 GB Type: eLearning

Get a practical introduction to Hadoop, the framework that made big data and large-scale analytics possible by combining distributed computing techniques with distributed storage. In this video tutorial, hosts Benjamin Bengfort and Jenny Kim discuss the core concepts behind distributed computing and big data, and then show you how to work with a Hadoop cluster and program analytical jobs. You'll also learn how to use higher-level tools such as Hive and Spark. Hadoop is a cluster computing technology that has many moving parts, including distributed systems administration, data engineering and warehousing methodologies, software engineering for distributed computing, and large-scale analytics. With this video, you'll learn how to operationalize analytics over large datasets and rapidly deploy analytical jobs with a variety of toolsets. Once you've completed this video, you'll understand how different parts of Hadoop combine to form an entire data pipeline managed by teams of data engineers, data programmers, data researchers, and data business people.
- Understand the Hadoop architecture and set up a pseudo-distributed development environment
- Learn how to develop distributed computations with MapReduce and the Hadoop Distributed File System (HDFS)
- Work with Hadoop via the command-line interface
- Use the Hadoop Streaming utility to execute MapReduce jobs in Python
- Explore data warehousing, higher-order data flows, and other projects in the Hadoop ecosystem
- Learn how to use Hive to query and analyze relational data using Hadoop
- Use summarization, filtering, and aggregation to move Big Data towards last mile computation
- Understand how analytical workflows including iterative machine learning, feature analysis, and data modeling work in a Big Data context
Benjamin Bengfort is a data scientist and programmer in Washington DC who prefers technology to politics but sees the value of data in every domain. Alongside his work teaching, writing, and developing large-scale analytics with a focus on statistical machine learning, he is finishing his PhD at the University of Maryland where he studies machine learning and artificial intelligence. Jenny Kim, a software engineer in the San Francisco Bay Area, develops, teaches, and writes about big data analytics applications and specializes in large-scale, distributed computing infrastructures and machine-learning algorithms to support recommendations systems.

01. Hadoop Fundamentals For Data Scientists
0101 Overview Of The Video Course

02. A Distributed Computing Environment
0201 The Motivation For Hadoop
0202 A Brief History Of Hadoop
0203 Understanding The Hadoop Architecture
0204 Setting Up A Pseudo-Distributed Environment
0205 The Distributed File System - HDFS
0206 Distributed Computing With MapReduce
0207 Word Count - The Hello World Of Hadoop

03. Computing With Hadoop
0301 How A MapReduce Job Works
0302 Mappers And Reducers Into Detail
0303 Working With Hadoop Via The Command Line - Starting HDFS And Yarn
0304 Working With Hadoop Via The Command Line - Loading Data Into HDFS
0305 Working With Hadoop Via The Command Line - Running A MapReduce Job
0306 How To Use Our Github Goodies
0307 Working Into Python With Hadoop Streaming
0308 Common MapReduce Tasks
0309 Spark on Hadoop 2
0310 Creating A Spark Application With Python

04. The Hadoop Ecosystem
0401 The Hadoop Ecosystem
0402 Data Warehousing With Hadoop
0403 Higher Order Data Flows
0404 Other Notable Projects

05. Working With Data On Hive
0501 Introduction To Hive
0502 Interacting With Data Via The Hive Console
0503 Creating Databases, Tables, And Schemas For Hive
0504 Loading Data Into Hive From HDFS
0505 Querying Data And Performing Aggregations With Hive

06. Towards Last Mile Computing
0601 Decomposing Large Data Sets To A Computational Space
0602 Linear Regressions
0603 Summarizing Documents With TF-IDF
0604 Classification Of Text
0605 Parallel Canopy Clustering
0606 Computing Recommendations Via Linear Log-Likelihoods

O'Reilly - Hadoop Fundamentals for Data Scientists

Direct Download

Tags: Reilly, Hadoop, Fundamentals, Scientists

Add Comments:
Enter Code: *