Genomics Workshop

This workshop is intended for researchers in genomics and related fields with little to no experience with the command-line, large-scale computing, or with computational genomics tools and workflows. The topics covered will include: an introduction to the Linux command-line, and introduction to large-scale computing and submitting jobs to a typical high-performance computing resource, running a typical raw data to variant calling workflow for Illumina data, and strategies for running the same workflow on multiple files in parallel, to take advantage of the cluster architecture.

Many parts of this lesson (including the website design) borrow from the lesson materials of Software Carpentry an organization dedicated to teaching researchers and scientists the basic “lab skills” for computing. You may be interested in Software Carpentry’s other lessons on topics like programming in R/Python, version control with Git/GitHub, etc.,

Prerequisites

Some knowledge of the Unix shell is desired, but not strictly necessary. Some knowledge of genetics is required; e.g., you should know what the terms “genome” and “mutation” refer to.

Checklist

  1. Obtain an account on the Palmetto Cluster, if you don’t have one already. Instructions for requesting an account are available here.

  2. Familiarize yourself with the procedure for logging-in to the cluster by following the instructions here.

Schedule

Setup Download files required for the lesson
Day 1 00:00 1. Introduction What is genomics?
What are some current genomic methods?
What should I expect from analyzing genomic data?
Why learn about the command-line?
Why use a high-performance computing cluster?
How do I login to the cluster?
Where can I get help?
00:00 2. Shell Basics How do I view the files and folders in the filesystem?
How do I specify the location of a file or folder in the filesystem?
How do I create, delete, move and rename files and folders?
How do I create and edit text files?
00:30 3. Running programs using the shell How can I run a program from the shell?
How can I save the result of a program?
How can I combine programs together?
01:00 4. Interacting with the Palmetto Cluster How is using HPC different from using my laptop or workstation?
What does the cluster “look” like?
Where can I store data on the cluster?
What software is available on the cluster?
How do I reserve hardware on the cluster?
01:30 Finish
Day 2 00:00 5. More about the command-line How do I search for things?
How do I replace text in files?
How do I install my own software?
How do I use the scratch directories?
00:31 Finish
Day 3 00:00 6. More about Palmetto Cluster How do I install my own software?
How do I use the scratch directories?
00:31 7. Navigating the NCBI database How do I navigate the NCBI database?
How do I download an SRA file?
00:52 8. Variant calling workflow
01:14 9. Parallel Variant Calling Workflow
01:35 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.