Big Data Analytics in Python#
Instructor: Linh B. Ngo
Office: 2092 Barre Hall
Office Hours: Wednesday Office-hours 8.30am – 11.30am
Email: lngo AT clemson DOT edu
Workshop Description#
This workshop will teach how to how to utilize Apache Spark and Python to perform large-scale, in-memory data analytics. Learning outcomes of this workshop include understanding the overall conceptual design of Spark and demonstrate the advantages of using Spark over traditional Hadoop MapReduce. Participants will also learn to develop Spark programs using Python and to leverage Spark’s specific capabilities such as SQLContext and DataFrame to assist with data analytics.
Prerequisites#
This workshop requires:
Familiarity with Python (Python I/II workshops)
Familiarity with Palmetto (Introduction to Research Computing on Palmetto)