I'm trying to read this tab-delimited file into pandas with one caveat: the last column (mean), must be converted from a string representing a value in scientific notation to a numpy.float64.
So far, I've tried
df = pd.DataFrame(pd.io.parsers.read_table(fle, converters={'mean': lambda x: np.float64(x)}))
but all I get in df['mean'] is 0
and -0
.
I've also tried importing without the converters
kwarg, and later casting the column by doing df['mean'].astype(np.float64)
, with similar results.
What gives?
They are not zero. pandas
probably does some formatting while printing DataFrame/Series
so they look like zero.
By the way, you don't need converters. read_table
correctly identifies them as float64
:
In [117]: df = pandas.read_table('gradStat_mmn.tdf')
In [118]: df.ix[0:10]
Out[118]:
Subject Group Local Global Attn mean
0 1 DSub S S Attn 0
1 1 DSub S S Dist 0
2 1 DSub D S Attn 0
3 1 DSub D S Dist 0
4 1 DSub S D Attn 0
5 1 DSub S D Dist 0
6 1 DSub D D Attn 0
7 1 DSub D D Dist 0
8 2 ASub S S Attn 0
9 2 ASub S S Dist 0
10 2 ASub D S Attn 0
In [119]: df['mean'].dtype
Out[119]: dtype('float64')
In [120]: df['mean'][0]
Out[120]: 3.2529000000000002e-22
This has been fixed with version 0.9 of pandas:
In [4]: df = pandas.read_table('http://dl.dropbox.com/u/6160029/gradStat_mmn.tdf')
In [5]: df.head()
Out[5]:
Subject Group Local Global Attn mean
0 1 DSub S S Attn 3.252900e-22
1 1 DSub S S Dist 6.010100e-22
2 1 DSub D S Attn 4.215700e-22
3 1 DSub D S Dist 8.308100e-22
4 1 DSub S D Attn 2.983500e-22
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.