Python – Intro to Pandas

This article is an introduction to pandas and the goal here is to convince you to move away from spreadsheet programs to pandas for data analysis

For most reproducible data analysis most people use some kind of scripting language as it is easy to make a mistake such as add a zero for not known values that breaks your model.

If you have Anaconda installed you will get Jupiter Notebook and Spider. There is a qt console and also iPython which you can get a couple of cool features on.

Go to Jupiter notebook go to file >> new. The cursor is the same you get if you were to type iPython into the command prompt. If you have anaconda installed it gets you scientific packages.

Run cells in real time with jupiter notebook. Shift enter will run a cell and create a new one underneath.

import pandas 
#read tsv (tab delminiated separated file) file
pandas.read_csv('../data/gapminder.tsv', sep'\t')

After, pandas will run the data set and Jupiter Notebook will print that dataset in the cell below. If you double click the dataset output it will allow scrolling and/or fit to screen.

df = pandas.read_csv('..\data\gapminder.tsv', sep='\t')

This will not print out the content. However, if you write:

df

This will then output your content in Jupiter Network