Introduction to Data Science using R

Learning objectives

By the end of this workshop, you will be able to

  1. Perform exploratory analysis on multiple data sets using a combination of the R programming language and the Jupyter notebook running on the Palmetto Supercomputer at Clemson University,
  2. Be able to install additional R libraries into your account on Palmetto, enabling additional data analytic tools,
  3. Be able to utilize dplyr to manipulate data,
  4. Be able to utilize ggplot to visualize data,
  5. Be able to optimize repetitive operations through function creations and loop enhancement, and
  6. Understand and be able to follow best practices in writing R codes.


This workshop requires attendees to have an active account on the Palmetto Supercomputer


These lessons are modeled after the structure of Data Carpentry and Software Carpentry lesson materials, an open source project. Like Data Carpentry, we welcome contributions of all kinds: new lessons, fixes/improvements to existing material, corrections to typos, bug reports, and reviews of proposed changes are all equally welcome. Please see our page on Contributing to get started.


  1. Introduction to the R programming language
  2. Interacting with Data
  3. Creating Functions
  4. Analyzing Multiple Data Sets
  5. Making Choices
  6. Addressing Data
  7. Reading and Writing CSV Files
  8. Best Practices for Writing R
  9. Understanding Factors
  10. Data Types and Structures
  11. Installing R Community Packages
  12. Manipulating Data with dplyr
  13. Visualizing Data with ggplot
  14. Improving Loop Performance