• Instructor: Linh B. Ngo
  • Office: the Death Valley
  • Office Hours: See Palmetto Documentation page
  • Email:
  • Phone: 123 456 7890

Workshop Description

This workshop will teach how to how to utilize Apache Spark and Python to perform large-scale, in-memory data analytics. Learning outcomes of this workshop include understanding the overall conceptual design of Spark and demonstrate the advantages of using Spark over traditional Hadoop MapReduce. Participants will also learn to develop Spark programs using Python and to leverage Spark’s specific capabilities such as SQLContext and DataFrame to assist with data analytics.


This workshop requires:

  • Familiarity with Python (Python I/II workshops)
  • Familiarity with Palmetto (Introduction to Research Computing on Palmetto)

Course Outline



Setup Preparing for the course
1. Introduction to Apache Spark What is Spark?
How programming is done in Spark?

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.