AnalyticsDojo

Introduction to R - Test Local Jupyter Notebook

rpi.analyticsdojo.com

Test Notebook

The goal of this notebook is to simply test the R environment and to show how to interact with a local data file. Let’s first check the R version.

version

               _                           
platform       x86_64-apple-darwin14.5.0   
arch           x86_64                      
os             darwin14.5.0                
system         x86_64, darwin14.5.0        
status                                     
major          3                           
minor          5.1                         
year           2018                        
month          07                          
day            02                          
svn rev        74947                       
language       R                           
version.string R version 3.5.1 (2018-07-02)
nickname       Feather Spray               

Reading a Local CSV File

  • The read.csv command can accepted a variety of delimited files.
  • Relative references are indicated with .. to indcate going up a directory.
  • Windows: - use either \ or / to indicate directories
  • setwd(‘C:\Users\Your_username\Desktop\r-bootcamp-2016’)
  • setwd(‘..\r-bootcamp-2016’)
# This will load the local iris.csv file into an R dataframe.  We will work with these a lot in the future.
#This is refered to a relative reference.  This is Relative to the current working directory. 
frame=read.csv(file="../../input/iris.csv", header=TRUE, sep=",")

# This will print out the dataframe.
frame

<th scope=col>sepal_length</th><th scope=col>sepal_width</th><th scope=col>petal_length</th><th scope=col>petal_width</th><th scope=col>species</th>
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.1 3.5 1.4 0.3 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
4.8 3.4 1.9 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
6.9 3.2 5.7 2.3 virginica
5.6 2.8 4.9 2.0 virginica
7.7 2.8 6.7 2.0 virginica
6.3 2.7 4.9 1.8 virginica
6.7 3.3 5.7 2.1 virginica
7.2 3.2 6.0 1.8 virginica
6.2 2.8 4.8 1.8 virginica
6.1 3.0 4.9 1.8 virginica
6.4 2.8 5.6 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.4 2.8 6.1 1.9 virginica
7.9 3.8 6.4 2.0 virginica
6.4 2.8 5.6 2.2 virginica
6.3 2.8 5.1 1.5 virginica
6.1 2.6 5.6 1.4 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.4 3.1 5.5 1.8 virginica
6.0 3.0 4.8 1.8 virginica
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica

Writing data out from R

Here you have a number of options.

1) You can write out R objects to an R Data file, as we’ve seen, using save() and save.image(). 2) You can use write.csv() and write.table() to write data frames/matrices to flat text files with delimiters such as comma and tab. 3) You can use write() to write out matrices in a simple flat text format. 4) You can use cat() to write to a file, while controlling the formatting to a fine degree. 5) You can write out in the various file formats mentioned on the previous slide

#Writing Dataframe to a file. 
write.csv(frame, file= "iris2.csv")

#Kaggle won't want the rownames. 
write.csv(frame, file = "iris3.csv",row.names=FALSE)

setwd('/Users/jasonkuruzovich/githubdesktop/0_class/techfundamentals-spring2018-materials/classes/')

The Working Directory

  • To read and write from R, you need to have a firm grasp of where in the computer’s filesystem you are reading and writing from.
  • It is common to set the working directory, and then just list the specific file without a path.
  • Windows: - use either \ or / to indicate directories
  • setwd('C:\\Users\\Your_username\\Desktop\\r-bootcamp-2016')
  • setwd('..\\r-bootcamp-2016')
#Whhile this example is Docker, should work similarly for Mac or Windows Based Machines. 
setwd("/home/jovyan/techfundamentals-fall2017-materials/classes/05-intro-r")
getwd()  # what directory will R look in?
setwd("/home/jovyan/techfundamentals-fall2017-materials/classes/input") # change the working directory
getwd() 
setwd("/home/jovyan/techfundamentals-fall2017-materials/classes/05-intro-r")
setwd('../input') # This is an alternate way of moving to data.
getwd()

'/home/jovyan/techfundamentals-fall2017-materials/classes/05-intro-r'
'/home/jovyan/techfundamentals-fall2017-materials/classes/input'
'/home/jovyan/techfundamentals-fall2017-materials/classes/input'
#Notice path isn't listed. We don't have to list the path if we have set the working directory

frame=read.csv(file="iris.csv", header=TRUE,sep=",")

Exercises

Basics

1) Make sure you are able to install packages from CRAN. E.g., try to install lmtest.

2) Figure out what your current working directory is.

Using the ideas

3) Put the data/iris.csv file in some other directory. Use setwd() to set your working directory to be that directory. Read the file in using read.csv(). Now use setwd() to point to a different directory. Write the data frame out to a file without any row names and without quotes on the character strings.

CREDITS

Copyright AnalyticsDojo 2016. This work is licensed under the Creative Commons Attribution 4.0 International license agreement. Work adopted from Berkley R Bootcamp.