Big Data Analytics in Python#

  • Instructor: Linh B. Ngo

  • Office: 2092 Barre Hall

  • Office Hours: Wednesday Office-hours 8.30am – 11.30am

  • Email: lngo AT clemson DOT edu

Workshop Description#

This workshop will teach how to how to utilize Apache Spark and Python to perform large-scale, in-memory data analytics. Learning outcomes of this workshop include understanding the overall conceptual design of Spark and demonstrate the advantages of using Spark over traditional Hadoop MapReduce. Participants will also learn to develop Spark programs using Python and to leverage Spark’s specific capabilities such as SQLContext and DataFrame to assist with data analytics.

Prerequisites#

This workshop requires:

  • Familiarity with Python (Python I/II workshops)

  • Familiarity with Palmetto (Introduction to Research Computing on Palmetto)