Brian Blaylock's Python Blog: October 2013

Wednesday, October 23, 2013

Simple file import using Numpy's "genfromtxt" function

I've always had trouble importing data into Python the way I want it. I like to use numpy's genfromtxt function because it is simple and useful. However, for long data sets this import method can take a long time. Anyways, here are some attributes of numpy.genfromtxt you should keep in mind:

Choose the right delimiter. The comma (,) is used quite often.
Choose a data type. The default is "float". When there are mixed data types use "None"
If there are column names it is useful to choose "names=True". This makes it easy to reference columns in the data set.

Here is an example:

I downloaded weather data from Mesowest and saved it in a file called "WBB_2013.txt". It looks like this:

MON,DAY,YEAR,HR,MIN,TMZN,TMPF,RELH,SKNT,GUST,DRCT,QFLG,PRES,SOLR,PREC,P05I,VOLT,DWPF

10,23,2013, 13,45,MDT, 62.1,32,3.9,9.2,283,2,25.30,610.9,,0.00,13.03,39.1

10,23,2013, 13,40,MDT, 61.9,32,4.8,9.6,285,2,25.30,615.7,,0.00,13.00,38.7

10,23,2013, 13,35,MDT, 62.2,31,4.4,13.2,252,2,25.30,618.6,,0.00,13.04,38.7

10,23,2013, 13,30,MDT, 61.7,31,4.1,9.0,257,2,25.30,619.4,,0.00,13.00,37.9

...etc.

The first row is the name of each variable--the "title" of the column. To import this data into Python I can use the following statement:

data = np.genfromtxt('WBB_2013.txt', delimiter = ',', dtype=None, names=True)

Here I have specified the file name, delimiter = ',', the data type as None dtype=None (because the timezone is a string while the others are floats), and set names = True so that I can reference columns by the name in the first line.

I can now access each column as follows:

Month = data['MON']
Day = data['DAY']
Temperature_F = data['TMPF']
etc.

Wednesday, October 23, 2013

Simple file import using Numpy's "genfromtxt" function

Search This Blog

Labels