简体   繁体   中英

How to read text file columns to apply python data frame?

I have a Original text file : original.txt

tmin,       tmax,     mean, fmin, fmax, stdev
0,        0.005000,    0,     0,    0,    0
0.005000, 0.010000,    0,     0,    0,    0

For calculating, I read this file as CSV

>>>import pandas as pd
>>>import numpy as np
>>>from pandas import Series, DataFrame

>>>df=pd.read_csv('oringinal.txt')
>>>df 
      tmin   tmax   mean   fmin   fmax   stdev
0    0.000  0.005      0      0      0       0
1    0.005  0.010      0      0      0       0

When I enter df.columns

Index([u'tmin', u' tmax', u' mean', u' fmin', u' fmax', u' stdev'], dtype='object')

What is u' ? and
I want to get some value from df.tmin[0], df.tmax[0], df.mean[0], df.fmin[0], df.fmax[0], df.stdev[0]... etc

When I enter df.tmax[0], below error occurs.

>>>df.tmax[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/pandas/core/generic.py", line 1947, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'tmax'

How to solve this problem?

There are spaces in your column names:

Index([u'tmin', u' tmax', u' mean', u' fmin', u' fmax', u' stdev'], dtype='object')

By default read_csv parses the file using commas as the delimiter between fields. So df ends up with a column named u' tmax' instead of u'tmax' , for instance.

To parse the file correctly, use

df = pd.read_csv('oringinal.txt', sep=r',\s*')

instead. The regex pattern ,\\s* matches a literal comma followed by 0-or-more whitespace characters.

Notice that now the column names do not include spaces:

In [117]: df.columns
Out[117]: Index(['tmin', 'tmax', 'mean', 'fmin', 'fmax', 'stdev'], dtype='object')

u'...' is Python2's way of representing of a unicode string .


As UMax points out in a now deleted answer , alternatively you could use

df = pd.read_csv('oringinal.txt', skipinitialspace=True)

This avoids using regex pattern for the delimiter. Since regex is only supported by the 'python' (parser) engine, and the 'c' engine (which understands the skipinitialspace=True ) is faster, UMax's alternative should be faster, especially for large files.

The u' indicates unicode string.

To get the values of tmin, tmax etc, just enter

df.tmin

for entire column or

df['tmin']

You can get specific values by doing the following:

df.tmin[0]

or

df['tmin'][0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM