Exploratory Data Analysis (EDA)

The goals of exploratory data analysis are listed as follows:

✓ Detection of data errors
✓ Checking of assumptions
✓ Finding hidden patterns
✓ Preliminary selection of appropriate models
✓ Determining relationships between the variables

Bar chart – After the data processing, bar chart is used to plot the discrete values against category data.

Pie chart – Pie charts is used to understand the distribution of proportions in an easy way.

Scatter plot – Scatter plot is used to see the relationships features.

Single line chart – Line chart displays continuous data as a series of points connected by a straight line. It is often used to plot time series data and understand the trends and correlations. By visually looking at the line chart, we may able to identify any outliers present in the data, then recursive approach is used to remove the outliers).

Multiline chart – Single line chart is used to see the trend of single variable but multiline chart is used to see the trend of multiple variable

Grouping the data and using dot plots – Dot plots are used for small-sized to medium-sized datasets. For large-sized data, a histogram is usually used.)

Using heat maps – In a heat map, the data is represented as a matrix where the range of values taken by attributes are represented as colour gradients.

Performing summary statistics and plots – If the underlying data is not unimodal, that is, it has multiple peaks, these quantities (mean, median, standard deviation) may not be of much use. If the given data is unimodal, that is, having only one peak, the mean, which gives the location, and standard deviation, which gives the variance, are valuable metrics. Mean is very sensitive to outliers and with the data having many outliers, median and percentile are useful.

Using a box-and-whisker plot – A box-and-whisker plot is a good companion with the summary statistics to view the statistical summary of the data in hand. It consists of following feature: A horizontal line indicating the median. A box spanning the interquartile range, measuring the dispersion, A set of whiskers that extends from the central box horizontally and vertically, which indicates the tail of the distribution.


