# Learning R (Introduction)

♦ R Programming language is generally used for developing statistical analysis, graphics representation, and reporting.

<- symbol is the assignment operator.
# symbol is used to comment a line

```# x equals 1
x<-1
# msg equals hellp
msg<- "hello"```

The basic arithmetic operations using R

```# addition
18 + 12
# subtraction
18 - 12
# multiplication
18 * 12
# division
18 / 12
# just the integer part of the quotient
18 %/% 12
# just the remainder part (modulo)
18 %% 12
# exponentiation (raising to a power)
18 ^ 12
# natural log (base e)
log(10)
# base 10 logs
log10(100)
# square root
sqrt(88)
# absolute value
abs(18 / -12)```

Defining vectors in R

```# Method 1
# numeric vector
x <- c(0.5, 0.6)
# complex vector
x <- c(1+0i, 2+4i)

# Method 2 - Use the vector() function to initialize vectors
x <- vector("numeric", length = 10)

# Method 3 - Creating vector of numerical numbers
# number sequence 1, 2, 3,.... 10
1:10
# number sequence from 1 to 10 and interval is 1
seq(from=1, to=10, by=1)

# Some useful functions
# to check the type of data of my.seq
class(my.seq)
# to check whether my.seq is vector
is.vector(my.seq)
# it will devide each elments of my.seq by 3
my.seq = my.seq / 3```

Defining matrices in R

```# Method 1
# To create empty 2 by 3 matrix
m <- matrix(nrow = 2, ncol = 3)
# To check the dimensionality
dim(m)

# Method 2
# Matrices are constructed column-wise
m <- matrix(1:6, nrow = 2, ncol = 3)

# Method 3
#Matrix created directly from vectors by adding a dimension attribute.
m <- 1:10
dim(m) <- c(2, 5)

# Method 3
x <- 1:3
y <- 10:12
# create matrix by column-binding
cbind(x, y)
# create matrix by row-binding
rbind(x, y)```

Defining lists  in R
Lists are a special type of vector that can contain elements of different classes.

```# Method 1
# this list contains different class of elements
x <- list(1, "a", TRUE, 1 + 4i)

# Method 2
# create empty list with the length of 5
x <- vector("list", length = 5)```

Defining factors in R
♦ Factors are used to represent categorical data and can be unordered or ordered.
♦ Factors are important in statistical modelling.

```# Levels are put in alphabetical order
x <- factor(c("yes", "yes", "no", "yes", "no"))

# table() will how many yes and no are available
table(x)

x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no"))
# Levels are put without alphabetical order
```

Missing Values
Missing values are denoted by NA or NaN. NA is used to represent missing numbers, and NAN is used to represent invalid numbers (0/0).

```# a vector is defined with missing number
x <- c(1, 2, NA, 10, 3)

# it will check whether this vector has any na values
is.na(x)
```

Data Frames
♦ Data frames are used to store tabular data in R.
♦ Data frames are represented as a special type of list where every element of the list has to have the same length.
♦ Unlike matrices, data frames can store different classes of objects in each column.
♦ In addition to column names, indicating the names of the variables or predictors, data frames have a special attribute called row.names which indicate information about each row of the data frame.

```# Define a data frame in R
x <- data.frame(foo = 1:4, bar = c(T, T, F, F))

# To show number of rows
nrow(x)

# To show number of columns
ncol(x)
```

Managing Data Frames with the dplyr package
♦ The data frame is a key data structure in statistics and in R.
♦ The dplyr package is designed filtering, re-ordering, and collapsing.

```#Installing dplyr package
install.packages("dplyr")

library(dplyr)

R\\chicago.rds')

#To show number of col and row of data
dim(chicago)
str(chicago)

# The select() function can be used to select columns of a data frame.

# Suppose we wanted to take the first 3 columns only.
names(chicago)[1:3]
subset <- select(chicago, city:dptp)

# if you wanted to keep every variable that ends with a “2”
subset <- select(chicago, ends_with("2"))

# You can also omit variables using the select()
select(chicago, -(city:dptp))

# If we wanted to keep every variable that starts with a “d”
subset <- select(chicago, starts_with("d"))

# The filter() function is used to extract subsets of rows from a data frame
# Extract the rows where PM2.5 is greater than 30
chic.f <- filter(chicago, pm25tmean2 > 30)
chic.f <- filter(chicago, pm25tmean2 > 30)
str(chic.f)
summary(chic.f\$pm25tmean2)

# Extract the rows where PM2.5 is greater than 30 and temperature is greater
than 80 degrees Fahrenheit.
chic.f <- filter(chicago, pm25tmean2 > 30 & tmpd > 80)
str(chic.f)
summary(chic.f\$pm25tmean2)

#The arrange() function is used to reorder rows of a data frame according to one
of the variables/columns

# We can order the rows of the data frame by date, so that the first row is the
earliest (oldest) observation and the last row is the latest (most recent)
observation.
chicago <- arrange(chicago, date)
chicago <- arrange(chicago, desc(date))
```

Logical operation in R

```# define a vector using boolean values
a <- c(TRUE, FALSE, FALSE, TRUE)
# define a numeric vector
b <- c(13, 7, 8, 2)
# selects true value elements
b[a]   // 13 2
# inverse of a
!a   // FALSE TRUE TRUE FALSE
# true as 1 and false as 0 and counts true and false values
sum(a)  // 2
```

Built-in search function

```example(mean)
help.search("optimization")
Help(mean)
```

Data input and output

```#Changing directories
Changing the default to the mydata folder in the C: drive
setwd("c:\\ mydata")

#Save the objects for a future session
dump("usefuldata", "useful.R")

#Retrieve the saved objects
source("useful.R")

#Save all of the objects that you have created during a session
dump(list=objects(), "all.R")

#Redirecting R output to text file
# Create a file solarmean.txt for output
sink("solarmean.txt")
# Write mean value to solarmean.txt