♦ R Programming language is generally used for developing statistical analysis, graphics representation, and reporting.
<- symbol is the assignment operator.
# symbol is used to comment a line
# x equals 1 x<-1 # msg equals hellp msg<- "hello"
The basic arithmetic operations using R
# addition 18 + 12 # subtraction 18 - 12 # multiplication 18 * 12 # division 18 / 12 # just the integer part of the quotient 18 %/% 12 # just the remainder part (modulo) 18 %% 12 # exponentiation (raising to a power) 18 ^ 12 # natural log (base e) log(10) # base 10 logs log10(100) # square root sqrt(88) # absolute value abs(18 / -12)
Defining vectors in R
# Method 1 # numeric vector x <- c(0.5, 0.6) # complex vector x <- c(1+0i, 2+4i) # Method 2 - Use the vector() function to initialize vectors x <- vector("numeric", length = 10) # Method 3 - Creating vector of numerical numbers # number sequence 1, 2, 3,.... 10 1:10 # number sequence from 1 to 10 and interval is 1 seq(from=1, to=10, by=1) # Some useful functions # to check the type of data of my.seq class(my.seq) # to check whether my.seq is vector is.vector(my.seq) # it will devide each elments of my.seq by 3 my.seq = my.seq / 3
Defining matrices in R
# Method 1 # To create empty 2 by 3 matrix m <- matrix(nrow = 2, ncol = 3) # To check the dimensionality dim(m) # Method 2 # Matrices are constructed column-wise m <- matrix(1:6, nrow = 2, ncol = 3) # Method 3 #Matrix created directly from vectors by adding a dimension attribute. m <- 1:10 dim(m) <- c(2, 5) # Method 3 x <- 1:3 y <- 10:12 # create matrix by column-binding cbind(x, y) # create matrix by row-binding rbind(x, y)
Defining lists in R
♦ Lists are a special type of vector that can contain elements of different classes.
# Method 1 # this list contains different class of elements x <- list(1, "a", TRUE, 1 + 4i) # Method 2 # create empty list with the length of 5 x <- vector("list", length = 5)
Defining factors in R
♦ Factors are used to represent categorical data and can be unordered or ordered.
♦ Factors are important in statistical modelling.
# Levels are put in alphabetical order x <- factor(c("yes", "yes", "no", "yes", "no")) # table() will how many yes and no are available table(x) x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no")) # Levels are put without alphabetical order
♦ Missing values are denoted by NA or NaN. NA is used to represent missing numbers, and NAN is used to represent invalid numbers (0/0).
# a vector is defined with missing number x <- c(1, 2, NA, 10, 3) # it will check whether this vector has any na values is.na(x)
♦ Data frames are used to store tabular data in R.
♦ Data frames are represented as a special type of list where every element of the list has to have the same length.
♦ Unlike matrices, data frames can store different classes of objects in each column.
♦ In addition to column names, indicating the names of the variables or predictors, data frames have a special attribute called row.names which indicate information about each row of the data frame.
# Define a data frame in R x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) # To show number of rows nrow(x) # To show number of columns ncol(x)
Managing Data Frames with the dplyr package
♦ The data frame is a key data structure in statistics and in R.
♦ The dplyr package is designed filtering, re-ordering, and collapsing.
#Installing dplyr package install.packages("dplyr") # load dplyr package into your R session library(dplyr) # Load chicago.rds file chicago <- readRDS('C:\\Users\\ahilan\\Dropbox\\Elect_dept_UOJ\\Statistics using R\\chicago.rds') #To show number of col and row of data dim(chicago) str(chicago) # The select() function can be used to select columns of a data frame. # Suppose we wanted to take the first 3 columns only. names(chicago)[1:3] subset <- select(chicago, city:dptp) head(subset) # if you wanted to keep every variable that ends with a “2” subset <- select(chicago, ends_with("2")) # You can also omit variables using the select() select(chicago, -(city:dptp)) # If we wanted to keep every variable that starts with a “d” subset <- select(chicago, starts_with("d")) # The filter() function is used to extract subsets of rows from a data frame # Extract the rows where PM2.5 is greater than 30 chic.f <- filter(chicago, pm25tmean2 > 30) chic.f <- filter(chicago, pm25tmean2 > 30) str(chic.f) summary(chic.f$pm25tmean2) # Extract the rows where PM2.5 is greater than 30 and temperature is greater than 80 degrees Fahrenheit. chic.f <- filter(chicago, pm25tmean2 > 30 & tmpd > 80) str(chic.f) summary(chic.f$pm25tmean2) #The arrange() function is used to reorder rows of a data frame according to one of the variables/columns # We can order the rows of the data frame by date, so that the first row is the earliest (oldest) observation and the last row is the latest (most recent) observation. chicago <- arrange(chicago, date) chicago <- arrange(chicago, desc(date))
Logical operation in R
# define a vector using boolean values a <- c(TRUE, FALSE, FALSE, TRUE) # define a numeric vector b <- c(13, 7, 8, 2) # selects true value elements b[a] // 13 2 # inverse of a !a // FALSE TRUE TRUE FALSE # true as 1 and false as 0 and counts true and false values sum(a) // 2
Built-in search function
example(mean) help.search("optimization") Help(mean)
Data input and output
#Changing directories Changing the default to the mydata folder in the C: drive setwd("c:\\ mydata") #Save the objects for a future session dump("usefuldata", "useful.R") #Retrieve the saved objects source("useful.R") #Save all of the objects that you have created during a session dump(list=objects(), "all.R") #Redirecting R output to text file # Create a file solarmean.txt for output sink("solarmean.txt") # Write mean value to solarmean.txt mean(solar.radiation) # Close solarmean.txt; print new output to screen sink()