简体   繁体   中英

How to read text file columns to apply python data frame?

I have a Original text file : original.txt

tmin,       tmax,     mean, fmin, fmax, stdev
0,        0.005000,    0,     0,    0,    0
0.005000, 0.010000,    0,     0,    0,    0

For calculating, I read this file as CSV

>>>import pandas as pd
>>>import numpy as np
>>>from pandas import Series, DataFrame

      tmin   tmax   mean   fmin   fmax   stdev
0    0.000  0.005      0      0      0       0
1    0.005  0.010      0      0      0       0

When I enter df.columns

Index([u'tmin', u' tmax', u' mean', u' fmin', u' fmax', u' stdev'], dtype='object')

What is u' ? and
I want to get some value from df.tmin[0], df.tmax[0], df.mean[0], df.fmin[0], df.fmax[0], df.stdev[0]... etc

When I enter df.tmax[0], below error occurs.

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/pandas/core/generic.py", line 1947, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'tmax'

How to solve this problem?

There are spaces in your column names:

Index([u'tmin', u' tmax', u' mean', u' fmin', u' fmax', u' stdev'], dtype='object')

By default read_csv parses the file using commas as the delimiter between fields. So df ends up with a column named u' tmax' instead of u'tmax' , for instance.

To parse the file correctly, use

df = pd.read_csv('oringinal.txt', sep=r',\s*')

instead. The regex pattern ,\\s* matches a literal comma followed by 0-or-more whitespace characters.

Notice that now the column names do not include spaces:

In [117]: df.columns
Out[117]: Index(['tmin', 'tmax', 'mean', 'fmin', 'fmax', 'stdev'], dtype='object')

u'...' is Python2's way of representing of a unicode string .

As UMax points out in a now deleted answer , alternatively you could use

df = pd.read_csv('oringinal.txt', skipinitialspace=True)

This avoids using regex pattern for the delimiter. Since regex is only supported by the 'python' (parser) engine, and the 'c' engine (which understands the skipinitialspace=True ) is faster, UMax's alternative should be faster, especially for large files.

The u' indicates unicode string.

To get the values of tmin, tmax etc, just enter


for entire column or


You can get specific values by doing the following:




The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM