This page demonstrates Python tips and tricks that I use in my everyday programming as an atmospheric science graduate student.
-Brian Blaylock

Wednesday, October 23, 2013

Simple file import using Numpy's "genfromtxt" function

I've always had trouble importing data into Python the way I want it. I like to use numpy's genfromtxt function because it is simple and useful. However, for long data sets this import method can take a long time. Anyways, here are some attributes of numpy.genfromtxt you should keep in mind:
  • Choose the right delimiter. The comma (,) is used quite often.
  • Choose a data type. The default is "float". When there are mixed data types use "None"
  • If there are column names it is useful to choose "names=True". This makes it easy to reference columns in the data set.
Here is an example:

I downloaded weather data from Mesowest and saved it in a file called "WBB_2013.txt". It looks like this:

MON,DAY,YEAR,HR,MIN,TMZN,TMPF,RELH,SKNT,GUST,DRCT,QFLG,PRES,SOLR,PREC,P05I,VOLT,DWPF
10,23,2013, 13,45,MDT, 62.1,32,3.9,9.2,283,2,25.30,610.9,,0.00,13.03,39.1
10,23,2013, 13,40,MDT, 61.9,32,4.8,9.6,285,2,25.30,615.7,,0.00,13.00,38.7
10,23,2013, 13,35,MDT, 62.2,31,4.4,13.2,252,2,25.30,618.6,,0.00,13.04,38.7
10,23,2013, 13,30,MDT, 61.7,31,4.1,9.0,257,2,25.30,619.4,,0.00,13.00,37.9
...etc.

The first row is the name of each variable--the "title" of the column. To import this data into Python I can use the following statement:

data = np.genfromtxt('WBB_2013.txt', delimiter = ',', dtype=None, names=True)

Here I have specified the file name, delimiter = ',', the data type as None dtype=None (because the timezone is a string while the others are floats), and set names = True so that I can reference columns by the name in the first line.

I can now access each column as follows:
  • Month = data['MON']
  • Day = data['DAY']
  • Temperature_F = data['TMPF']
  • etc.

3 comments:

  1. Sometimes a file will have a header or footer. To skip the header, do this:

    np.genfromtxt('filename', skip_header=#)

    where # is the number of first lines to skip
    can also use skip_footer to skip the bottom lines.

    ReplyDelete
    Replies
    1. more on np.genfromtxt here: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

      Delete
  2. to see the list of names the function created, type:
    print data.dtype.names

    ReplyDelete

Note: Only a member of this blog may post a comment.