# Introduction to Data Science using R

## Brief history of R

• Until the mid-70s, much of statistical computing work was being done using Fortran.
• In 1975-76, the S programming language was developed at Bell Labs to an alternative and more interactive approach to statistical computing.
• R is developed as an open source reimplementation of S with a first beta-release in 2000. R is currently at its third major version.

## Why R

• R is written by statisticians, for statisticians.
• R has been widely adopted by the statistic community.
• R contains a wide range of statistical techniques, mainly due to the enthusiastic contributions from the user community. These inclure linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others. It also has excellent tools to help with data acquisition, extration, manipulation, and visualization.

We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in comma-separated values (CSV) format: each row holds information for a single patient, and the columns represent successive days. The first few rows of our first file look like this:

0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1
0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1
0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1

{: .source}

We want to:

• load that data into memory,
• calculate the average inflammation per day across all patients, and
• plot the result.