Instructor

  • Instructor: Linh B. Ngo
  • Office: the Death Valley
  • Office Hours: See Palmetto Documentation page
  • Email: lngo@clemson.edu
  • Phone: 123 456 7890

Workshop Description

This workshop will teach how to how to utilize Apache Spark and Python to perform large-scale, in-memory data analytics. Learning outcomes of this workshop include understanding the overall conceptual design of Spark and demonstrate the advantages of using Spark over traditional Hadoop MapReduce. Participants will also learn to develop Spark programs using Python and to leverage Spark’s specific capabilities such as SQLContext and DataFrame to assist with data analytics.

Prerequisites

This workshop requires:

  • Familiarity with Python (Python I/II workshops)
  • Familiarity with Palmetto (Introduction to Research Computing on Palmetto)

Course Outline

Topic

Description

Setup Preparing for the course
1. Introduction to Apache Spark What is Spark?
How programming is done in Spark?
2. Launching the Spark cluster How do I launch a Spark cluster at scale of Palmetto?
3. Workshop notebooks How do I launch the workshop notebooks and link them to the Spark cluster?
Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.