1. Where are the notebooks#
The notebooks for this workshop are included inside myspark.
Double-click on
myspark
directory in theFile Browser
tab.
We will go through the four notebooks today.
Double-click on
intro-to-pyspark-01.ipynb
to open the first notebook.
{: .slide}
2. Notebook configuration#
The first code cell of all the notebooks is the same.
This cell sets up the configuration to connect to the launched Spark cluster.
The following three configuration settings should be modified to match what was reported on the terminal (they are not here).
spark.driver.memory
: value fromMemory per workers
.spark.executor.instances
: value fromNum workers
.spark.executor.memory
: value fromMemory per workers
.spark.executor.cores
: value fromCores per worker
.
{: .slide}
3. Clean up#
Each notebook will open a separate SparkContext on the Spark cluster. This
SparkContext must be stopped (last cell of each notebook) when we are done.
{: .slide}
{% include links.md %}