Learning R (Programming Basics)

For loop in R

# Example 1
# print 1, 2, 3...10
for(i in 1:10) {
print(i)
}

# Example 2
# print a, b, c, d
x <- c("a", "b", "c", "d")
for(i in 1:4) {
print(x[i])
}

Nested for loop
It is defined as a set of for loops within for loops.

#Example
M<- matrix(1:9, ncol=0)
Sum<- 0
for (i in seq(nrow(M))) {  
  for (j in seq(ncol(M))) {    
    sum<- sum + M[i,j]    
    print sum    
  }
}

While loop
Loop runs with condition.

#Example:
i<- 1
# When i > =8, loop terminates
while ( i< 8) {  
  print i  
  i<- i+1
} 

If statement
This structure allows you to test a condition and act on it depending on whether
it’s true or false.

#Example
# if random number is greater than 3, it will print 10. Otherwise print 0.
x <- runif(1, 0, 10)
print(x)
if(x > 3) {
print(10) 
} 
else {
print(0)
}

Ifelse statement
This structure is an equivalent form of if else condition but this statement is applied to each element of vector individually.

#Example
# Ifelse statement checks each elements of vector and if it's odd,
prints are odd number otherwise prints as even.
X<- 1:8
Ifelse (x%%2, paste0(x, “  : odd number”), paste0 (x, “  : even number”)

Next statement
Next statement is used to skip some iterations.

#Example 1
# next statement skip the first 20 iterations
for(i in 1:100) {
if(i <= 20) {
next
}
x[i]=i
}

Break statement
Break is used to exit a loop immediately, regardless of what iteration the loop may be on.

#Example 2 
# stop loop after 20 iterations 
for(i in 1:100) { 
  print(i) 
  if(i > 20) { 
    break 
  } 
}

Repeat
It’s infinite loop and break statement is used to terminate from loop.

apply
Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.

Lapply
apply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.

Writing functions in R
Functions are defined using the function() directive and are stored as R objects just like anythingelse. In particular, they are R objects of class “function”.

#Example 1
func1 <- function() {
print("Hello, world!")
} f() 

func1() 

#Example 2
func2 <- function(num) {
for(i in seq_len(num)) {
print("Hello, world!")
}
}
f(4)

Function with return
return() functions to return a value immediately from a function.

#Example
#counting odd numbers from a vector
oddcount <- function(x) {
  return(length(which(x%%2==1)))
}

R Programming Environment
Environment is a collection of objects.

Global variables
Global variables are those variables which exists throughout the execution of a program. It can be changed and accessed from any part of the program.

Local variables
Local variables are those variables which exist only within a certain part of a program like a function, and is released when the function call ends.

Taking input from user

#Example
my.age <- readline(prompt="Enter age: ")
# Convert to integer
my.age <- as.integer(my.age)

Recursion
A function that calls itself.

#Example 1
#Finding factorial - n! = n*(n-1)! 
recursive.factorial <- function(x) {   
  if (x == 0)    return (1)   
  else    return (x * recursive.factorial(x-1))
}

#Example 2
#The Fibonacci sequenceThe Fibonacci sequence is a series of numbers 
#where a number is found by adding up the two numbers before it.
#Starting with 1, the sequence goes 1, 1, 2, 3, 5, 8, 13, 21, 34.  

recurse_fibonacci <- function(n) {    
  if(n <= 1)    return(n) 
  else    return(recurse_fibonacci(n-1) + recurse_fibonacci(n-2))
} 

for(i in 0:(12-1)) {    
  print(recurse_fibonacci(i))
}

Algorithm analysis
An algorithm is evaluated based on following attributes:
◙ Shorter running time
◙ Lesser memory utilization

Memory management in R
R allocates memory differently to different objects in its environment. Memory allocation can be determined using the object_size function from the pryr package.

System runtime in R
System runtime helps to compare the different algorithms and pick the best algorithm. The microbenchmark package on CRAN is used to evaluate the runtime of any expression/function/code at an accuracy of a sub-millisecond.

Algorithm asymptotic analysis
Asymptotic notations are commonly used to determine the complexity in calculating the runtime of an algorithm. Big O (upper bound), Big Omega (lower bound), and Big Theta (average) are the simplest forms offunctional equations, which represent an algorithm’s growth rate or its system runtime.

Assignment operator
Assigning an element (numeric, character, complex, or logical) to an object requires a constant amount of time. The asymptote (Big Theta notation) of the assignment operation is θ(1).

Simple for loop
The total cost of this for loop is θ(n).

Nested loop
The total cost of nested loop is θ(n2).

Writing sorting algorithms in R
Bubble sort
Bubble sort is a simple sorting algorithm. This sorting algorithm is comparison-based algorithm in which each pair of adjacent elements is compared and the elements are swapped if they are not in order.

bubblesort <- function(x) {
  if (length(x) < 2) 
    return (x)
  # last is the last element to compare with
  for(last in length(x):2) {  
    for(first in 1:(last - 1)) {    
      if(x[first] > x[first + 1]) {      
      # swap the pair      
        save <- x[first]      
        x[first] <- x[first + 1]      
        x[first + 1] <- save    
      }
    }
  }
return (x)
}

Quick sort
Quick sort involves following steps:
◙ Pick an element, called a pivot, from the array.
◙ Partitioning: reorder the array so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.
◙ Recursively apply the above steps to the sub-array of elements with smaller values and separately to the sub-array of elements with greater values.

quickSort <- function(vect) {    
  if (length(vect) <= 1) {      
    return(vect)  }  
  # Pick an element from the vector  
  element <- vect[1]  
  partition <- vect[-1]  
  # Reorder vector so that integers less than element  
  # come before, and all integers greater come after.  
  v1 <- partition[partition < element]  
  v2 <- partition[partition >= element]  
  # Recursively apply steps to smaller vectors.  
  v1 <- quickSort(v1)  
  v2 <- quickSort(v2)  
  return(c(v1, element, v2))
}

Learning R (Introduction)

♦ R Programming language is generally used for developing statistical analysis, graphics representation, and reporting.

<- symbol is the assignment operator.
# symbol is used to comment a line

# x equals 1
x<-1
# msg equals hellp
msg<- "hello"

The basic arithmetic operations using R

# addition
18 + 12
# subtraction
18 - 12
# multiplication
18 * 12
# division    
18 / 12 
# just the integer part of the quotient
18 %/% 12
# just the remainder part (modulo)
18 %% 12
# exponentiation (raising to a power)
18 ^ 12
# natural log (base e)
log(10)
# base 10 logs
log10(100)
# square root    
sqrt(88)
# absolute value
abs(18 / -12)

Defining vectors in R

# Method 1
# numeric vector
x <- c(0.5, 0.6)
# complex vector
x <- c(1+0i, 2+4i)    

# Method 2 - Use the vector() function to initialize vectors
x <- vector("numeric", length = 10)

# Method 3 - Creating vector of numerical numbers
# number sequence 1, 2, 3,.... 10
1:10
# number sequence from 1 to 10 and interval is 1
seq(from=1, to=10, by=1) 

# Some useful functions
# to check the type of data of my.seq
class(my.seq)
# to check whether my.seq is vector
is.vector(my.seq) 
# it will devide each elments of my.seq by 3
my.seq = my.seq / 3

Defining matrices in R

# Method 1
# To create empty 2 by 3 matrix
m <- matrix(nrow = 2, ncol = 3)
# To check the dimensionality 
dim(m)

# Method 2 
# Matrices are constructed column-wise
m <- matrix(1:6, nrow = 2, ncol = 3)

# Method 3
#Matrix created directly from vectors by adding a dimension attribute.
m <- 1:10
dim(m) <- c(2, 5)

# Method 3
x <- 1:3
y <- 10:12
# create matrix by column-binding
cbind(x, y)
# create matrix by row-binding 
rbind(x, y)

Defining lists  in R
Lists are a special type of vector that can contain elements of different classes.

# Method 1
# this list contains different class of elements
x <- list(1, "a", TRUE, 1 + 4i)

# Method 2
# create empty list with the length of 5
x <- vector("list", length = 5)

Defining factors in R
♦ Factors are used to represent categorical data and can be unordered or ordered.
♦ Factors are important in statistical modelling.

# Levels are put in alphabetical order 
x <- factor(c("yes", "yes", "no", "yes", "no"))    

# table() will how many yes and no are available
table(x) 

x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no"))
# Levels are put without alphabetical order

Missing Values
Missing values are denoted by NA or NaN. NA is used to represent missing numbers, and NAN is used to represent invalid numbers (0/0).

# a vector is defined with missing number
x <- c(1, 2, NA, 10, 3)    

# it will check whether this vector has any na values
is.na(x)    

Data Frames
♦ Data frames are used to store tabular data in R.
♦ Data frames are represented as a special type of list where every element of the list has to have the same length.
♦ Unlike matrices, data frames can store different classes of objects in each column.
♦ In addition to column names, indicating the names of the variables or predictors, data frames have a special attribute called row.names which indicate information about each row of the data frame.

# Define a data frame in R
x <- data.frame(foo = 1:4, bar = c(T, T, F, F))

# To show number of rows
nrow(x) 

# To show number of columns
ncol(x)

Managing Data Frames with the dplyr package
♦ The data frame is a key data structure in statistics and in R.
♦ The dplyr package is designed filtering, re-ordering, and collapsing.

#Installing dplyr package
install.packages("dplyr")

# load dplyr package into your R session
library(dplyr)    

# Load chicago.rds file
chicago <- readRDS('C:\\Users\\ahilan\\Dropbox\\Elect_dept_UOJ\\Statistics using 
R\\chicago.rds') 

#To show number of col and row of data
dim(chicago) 
str(chicago)

# The select() function can be used to select columns of a data frame.

# Suppose we wanted to take the first 3 columns only.
names(chicago)[1:3]
subset <- select(chicago, city:dptp)
head(subset)

# if you wanted to keep every variable that ends with a “2”
subset <- select(chicago, ends_with("2"))

# You can also omit variables using the select()
select(chicago, -(city:dptp))

# If we wanted to keep every variable that starts with a “d”
subset <- select(chicago, starts_with("d"))

# The filter() function is used to extract subsets of rows from a data frame
# Extract the rows where PM2.5 is greater than 30
chic.f <- filter(chicago, pm25tmean2 > 30)    
chic.f <- filter(chicago, pm25tmean2 > 30)
str(chic.f)
summary(chic.f$pm25tmean2)

# Extract the rows where PM2.5 is greater than 30 and temperature is greater 
than 80 degrees Fahrenheit.
chic.f <- filter(chicago, pm25tmean2 > 30 & tmpd > 80)
str(chic.f)
summary(chic.f$pm25tmean2)

#The arrange() function is used to reorder rows of a data frame according to one 
of the variables/columns

# We can order the rows of the data frame by date, so that the first row is the 
earliest (oldest) observation and the last row is the latest (most recent) 
observation.
chicago <- arrange(chicago, date)
chicago <- arrange(chicago, desc(date))

Logical operation in R

# define a vector using boolean values
a <- c(TRUE, FALSE, FALSE, TRUE)
# define a numeric vector
b <- c(13, 7, 8, 2)    
# selects true value elements
b[a]   // 13 2     
# inverse of a
!a   // FALSE TRUE TRUE FALSE
# true as 1 and false as 0 and counts true and false values
sum(a)  // 2  

Built-in search function

example(mean)
help.search("optimization")
Help(mean)

Data input and output

#Changing directories
Changing the default to the mydata folder in the C: drive
setwd("c:\\ mydata")

#Save the objects for a future session
dump("usefuldata", "useful.R")

#Retrieve the saved objects
source("useful.R")

#Save all of the objects that you have created during a session
dump(list=objects(), "all.R")

#Redirecting R output to text file
# Create a file solarmean.txt for output
sink("solarmean.txt") 
# Write mean value to solarmean.txt
mean(solar.radiation)
# Close solarmean.txt; print new output to screen
sink()