- Choose the right delimiter. The comma (,) is used quite often.
- Choose a data type. The default is "float". When there are mixed data types use "None"
- If there are column names it is useful to choose "names=True". This makes it easy to reference columns in the data set.
Here is an example:
I downloaded weather data from Mesowest and saved it in a file called "WBB_2013.txt". It looks like this:
MON,DAY,YEAR,HR,MIN,TMZN,TMPF,RELH,SKNT,GUST,DRCT,QFLG,PRES,SOLR,PREC,P05I,VOLT,DWPF
10,23,2013, 13,45,MDT, 62.1,32,3.9,9.2,283,2,25.30,610.9,,0.00,13.03,39.1
10,23,2013, 13,40,MDT, 61.9,32,4.8,9.6,285,2,25.30,615.7,,0.00,13.00,38.7
10,23,2013, 13,35,MDT, 62.2,31,4.4,13.2,252,2,25.30,618.6,,0.00,13.04,38.7
10,23,2013, 13,30,MDT, 61.7,31,4.1,9.0,257,2,25.30,619.4,,0.00,13.00,37.9
...etc.
The first row is the name of each variable--the "title" of the column. To import this data into Python I can use the following statement:
data = np.genfromtxt('WBB_2013.txt', delimiter = ',', dtype=None, names=True)
Here I have specified the file name, delimiter = ',', the data type as None dtype=None (because the timezone is a string while the others are floats), and set names = True so that I can reference columns by the name in the first line.
I can now access each column as follows:
- Month = data['MON']
- Day = data['DAY']
- Temperature_F = data['TMPF']
- etc.