**Data matrix
**Data can often be represented or abstracted as an n×d data matrix, with n rows and d columns, where rows correspond to entities in the dataset, and columns represent attributes or features or properties of interest.

The n×d data matrix is given as

**Numeric Attributes** – A numeric attribute is one that has a real-valued or integer-valued domain. For example, Age.

**Categorical Attributes** – A categorical attribute is one that has a set-valued domain composed of a set of symbols. For example, Sex could be categorical attributes.

**Orthogonality – **Two vectors a and b are said to be orthogonal if the angle between them is 90◦, which implies that cos θ =0. Dot product of a and b is 0.

**Orthogonal Projection – **In data mining, we may need to project a point or vector onto another vector to obtain a new point after a change of the basis vectors. Let a, b be two m-dimensional vectors. An orthogonal decomposition of the vector b in the direction of another vector a, illustrated in below Figure,

The vector p is called the orthogonal projection or simply projection of b on the vector a.

**Centered Data Matrix
**The centered data matrix is obtained by subtracting the mean from all the points

**Linear Independence
**We say that the vectors v1, . . . ,vk are linearly dependent if at least one vector can be written as a linear combination of the others as follows,

where c1,c2, . . . ,ck are scalers

A set of vectors is linearly independent if none of them can be written as a linear combination of the other vectors in the set.

**Dimension and Rank
**The maximum number of linearly independent vectors in a

**matrix**is equal to the number of non-zero rows in its row echelon

**matrix**. Therefore, to

**find the rank**of a

**matrix**, we simply transform the

**matrix**to its row echelon form and count the number of non-zero rows.

For the data matrix D ∈ R^{n}^{×d}, we have rank(D) ≤ min(n,d), which follows from the fact that the column space can have dimension at most d, and the row space can have dimension at most n. If rank(D) < d, then the data points reside in a lower dimensional subspace of R^{d}, and in this case rank(D) gives an indication about the intrinsic dimensionality of the data.

In fact, with dimensionality reduction methods it is often possible to approximate D ∈ R^{n}^{×d} with a derived data matrix D′ ∈ R^{n}^{×k}, which has much lower dimensionality, that is, k ≪ d. In this case k may reﬂect the “true” intrinsic dimensionality of the data.

**Statistic
**We can estimate a parameter of the population by defining an appropriate sample statistic, which is defined as a function of the sample.

The random sample of size m drawn from a (multivariate) random variable X is defined as

A statistic θ is a function θ: S1, S2, . . ., Sm

The statistic is an estimate of the corresponding population parameter θ. If we use the value of a statistic to estimate a population parameter, this value is called a point estimate of the parameter, and the statistic is called an estimator of the parameter.

**Univariate analysis
**Univariate analysis focuses on a single attribute at a time. The data matrix is given as

X is assumed to be a random variable.

**Mean – **The mean, also called the expected value, of a random variable X is the arithmetic average of the values of X. The mean of discrete variable is defined as

The expected value of a continuous random variable X is defined as

**Sample Mean** – The sample mean is a statistic, µ: {x1, x2, . . . ,xn}, which is defined as the average value of xi ’s

Statistic is robust if it is not affected by extreme values/ outliers in the data.

**Median – **The median of a random variable is defined as

The median is robust, as it is not affected very much by extreme values.

**Measures of Dispersion
**The measures of dispersion give an indication about the spread or variation in the values of a random variable.

**Range
**The range of a random variable X is the difference between the maximum and minimum values of X, which is defined as

**Interquartile Range
**Quartile divides the data into four equal parts. Quartiles correspond to the quantile values of 0.25, 0.5, 0.75, and 1.0. The first quartile is the value q1 = F

^{-1}(0.25). The second quartile is the same as the median value q2 = F

^{-1}(0.5). The third quartile q3 = F

^{-1}(0.75).

Interquartile range (IQR) is defined as

**Variance and Standard Deviation
**The variance of a random variable X provides a measure of how much the values of X deviate from the mean or expected value of X. Variance is defined as

The standard deviation, σ, is defined as square root of the variance, σ^{2}.

Sample variance is defined as

The standard score/ z score – sample value xi is the number of standard deviations the value is away from the mean:

**Multivariate analysis
**The d numeric attributes full data matrix is defined as

**Mean
**The multivariate mean vector is obtained by taking the mean of each attribute which is defined as

**Covariance Matrix
**The multivariate covariance information is captured by the d ×d (square) symmetric covariance matrix that gives the covariance for each pair of attributes:

The diagonal element σ_{i}^{2} specifies the attribute variance for Xi, whereas the off-diagonal elements σ_{ij} =σ_{ji} represent the covariance between attribute pairs Xiand Xj.

**Data Normalization
**When analyzing two or more attributes it is often necessary to normalize the values of the attributes, especially in those cases where the values are vastly different in scale.

In range normalization, each value is scaled as follows,

After transformation the new attribute takes on values in the range [0;1].

**Standard Score Normalization
**In standard score normalization, also called z-normalization, each value is replaced by

**Univariate Normal Distribution
**If a random variable X has a normal distribution, with the parameters mean µ and variance σ

^{2}, the probability density function of X is given as

**Probability Mass
**Given an interval [a, b] the probability mass of the normal distribution within that interval is given as

The probability mass concentrated within k standard deviations from the mean is given as

Normal distribution with different variances

**Multivariate Normal Distribution
**Given the d-dimensional vector random variable X = (X1,X2, . . . ,Xd), we say that X has a multivariate normal distribution, with the parameters mean µ and covariance matrix S, the joint multivariate probability density function is given as

An example of bivariate normal density and contours is shown as follows,