Learning R (Introduction)

♦ R Programming language is generally used for developing statistical analysis, graphics representation, and reporting.

<- symbol is the assignment operator.
# symbol is used to comment a line

# x equals 1
# msg equals hellp
msg<- "hello"

The basic arithmetic operations using R

# addition
18 + 12
# subtraction
18 - 12
# multiplication
18 * 12
# division    
18 / 12 
# just the integer part of the quotient
18 %/% 12
# just the remainder part (modulo)
18 %% 12
# exponentiation (raising to a power)
18 ^ 12
# natural log (base e)
# base 10 logs
# square root    
# absolute value
abs(18 / -12)

Defining vectors in R

# Method 1
# numeric vector
x <- c(0.5, 0.6)
# complex vector
x <- c(1+0i, 2+4i)    

# Method 2 - Use the vector() function to initialize vectors
x <- vector("numeric", length = 10)

# Method 3 - Creating vector of numerical numbers
# number sequence 1, 2, 3,.... 10
# number sequence from 1 to 10 and interval is 1
seq(from=1, to=10, by=1) 

# Some useful functions
# to check the type of data of my.seq
# to check whether my.seq is vector
# it will devide each elments of my.seq by 3
my.seq = my.seq / 3

Defining matrices in R

# Method 1
# To create empty 2 by 3 matrix
m <- matrix(nrow = 2, ncol = 3)
# To check the dimensionality 

# Method 2 
# Matrices are constructed column-wise
m <- matrix(1:6, nrow = 2, ncol = 3)

# Method 3
#Matrix created directly from vectors by adding a dimension attribute.
m <- 1:10
dim(m) <- c(2, 5)

# Method 3
x <- 1:3
y <- 10:12
# create matrix by column-binding
cbind(x, y)
# create matrix by row-binding 
rbind(x, y)

Defining lists  in R
Lists are a special type of vector that can contain elements of different classes.

# Method 1
# this list contains different class of elements
x <- list(1, "a", TRUE, 1 + 4i)

# Method 2
# create empty list with the length of 5
x <- vector("list", length = 5)

Defining factors in R
♦ Factors are used to represent categorical data and can be unordered or ordered.
♦ Factors are important in statistical modelling.

# Levels are put in alphabetical order 
x <- factor(c("yes", "yes", "no", "yes", "no"))    

# table() will how many yes and no are available

x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no"))
# Levels are put without alphabetical order

Missing Values
Missing values are denoted by NA or NaN. NA is used to represent missing numbers, and NAN is used to represent invalid numbers (0/0).

# a vector is defined with missing number
x <- c(1, 2, NA, 10, 3)    

# it will check whether this vector has any na values

Data Frames
♦ Data frames are used to store tabular data in R.
♦ Data frames are represented as a special type of list where every element of the list has to have the same length.
♦ Unlike matrices, data frames can store different classes of objects in each column.
♦ In addition to column names, indicating the names of the variables or predictors, data frames have a special attribute called row.names which indicate information about each row of the data frame.

# Define a data frame in R
x <- data.frame(foo = 1:4, bar = c(T, T, F, F))

# To show number of rows

# To show number of columns

Managing Data Frames with the dplyr package
♦ The data frame is a key data structure in statistics and in R.
♦ The dplyr package is designed filtering, re-ordering, and collapsing.

#Installing dplyr package

# load dplyr package into your R session

# Load chicago.rds file
chicago <- readRDS('C:\\Users\\ahilan\\Dropbox\\Elect_dept_UOJ\\Statistics using 

#To show number of col and row of data

# The select() function can be used to select columns of a data frame.

# Suppose we wanted to take the first 3 columns only.
subset <- select(chicago, city:dptp)

# if you wanted to keep every variable that ends with a “2”
subset <- select(chicago, ends_with("2"))

# You can also omit variables using the select()
select(chicago, -(city:dptp))

# If we wanted to keep every variable that starts with a “d”
subset <- select(chicago, starts_with("d"))

# The filter() function is used to extract subsets of rows from a data frame
# Extract the rows where PM2.5 is greater than 30
chic.f <- filter(chicago, pm25tmean2 > 30)    
chic.f <- filter(chicago, pm25tmean2 > 30)

# Extract the rows where PM2.5 is greater than 30 and temperature is greater 
than 80 degrees Fahrenheit.
chic.f <- filter(chicago, pm25tmean2 > 30 & tmpd > 80)

#The arrange() function is used to reorder rows of a data frame according to one 
of the variables/columns

# We can order the rows of the data frame by date, so that the first row is the 
earliest (oldest) observation and the last row is the latest (most recent) 
chicago <- arrange(chicago, date)
chicago <- arrange(chicago, desc(date))

Logical operation in R

# define a vector using boolean values
# define a numeric vector
b <- c(13, 7, 8, 2)    
# selects true value elements
b[a]   // 13 2     
# inverse of a
# true as 1 and false as 0 and counts true and false values
sum(a)  // 2  

Built-in search function


Data input and output

#Changing directories
Changing the default to the mydata folder in the C: drive
setwd("c:\\ mydata")

#Save the objects for a future session
dump("usefuldata", "useful.R")

#Retrieve the saved objects

#Save all of the objects that you have created during a session
dump(list=objects(), "all.R")

#Redirecting R output to text file
# Create a file solarmean.txt for output
# Write mean value to solarmean.txt
# Close solarmean.txt; print new output to screen

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s