Processing data form bash

Compressing Files

#compressing data/tmp.txt to temp.tar.gz
$ tar -zcvf temp.tar.gz data/tmp.txt
#-z - compress archive using gzip algorithm
#-c - Create archive
#-v - verbose, display progress while creating archive
f - archive File name

Decompressing Files

# decompress the file in the same directory
$ tar -zxvf temp.tar.gz
# -x - extract files

# decompress the file in a particular directory
$ tar -zxvf temp.tar.gz -C /tmp

Processing CSV files in bash – csvkit is used to process the csv files.

# installing csvkit on ubuntu
$ sudo pip install csvkit

# getting data from web
$ wget https://github.com/onyxfish/csvkit/raw/master/examples/realdata/ne_1033_data.xlsx

# reading xlsx file
$ in2csv data-science-cmd-line/book/ch03/data/imdb-250.xlsx | head -n 3

# though .xlsx files are not readable, we can make it to readable format by using csvlook command
$ in2csv data-science-cmd-line/book/ch03/data/imdb-250.xlsx | head -n 3 | csvcut -c Title,Year,Rating | csvlook

Querying relational databases from bash – If the data is stored in SQL database, sql2csv command is used to query the data. sql2csv supports SELECT, INSERT, UPDATE, and DELETE queries.

#how to select specific data from sql database
$ sql2csv --db  'sqlite:///data-science-cmd-line/book/ch03/data/iris.db' --query 'SELECT * FROM iris ''WHERE sepal_length > 5.5'
# option --db follows the url link of SQL database

Reading date from web API – web APIs return data in a structured format, such as JSON or XML. It is easily processed by other tools, such as jq.

curl -s http://api.randomuser.me | jq '.'
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s