[英]Pandas read scientific notation and change
I have a dataframe in pandas that i'm reading in from a csv.我在 Pandas 中有一个数据框,我正在从 csv 中读取它。
One of my columns has values that include NaN
, floats
, and scientific notation, ie 5.3e-23
我的一列的值包括
NaN
、 floats
和科学记数法,即5.3e-23
My trouble is that as I read in the csv, pandas views these data as an object dtype
, not the float32
that it should be.我的问题是,当我在 csv 中阅读时,pandas 将这些数据视为
object dtype
,而不是它应该是的float32
。 I guess because it thinks the scientific notation entries are strings.我猜是因为它认为科学记数法条目是字符串。
I've tried to convert the dtype using df['speed'].astype(float)
after it's been read in, and tried to specify the dtype as it's being read in using df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a'])
.我尝试在读入后使用
df['speed'].astype(float)
转换df['speed'].astype(float)
,并尝试使用df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a'])
指定正在读入的df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a'])
。 This throws the error ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...
这将引发错误
ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...
So far neither of these methods have worked.到目前为止,这两种方法都没有奏效。 Am I missing something that is an incredibly easy fix?
我是否错过了一些非常容易修复的东西?
this question seems to suggest I can specify known numbers that might throw an error, but i'd prefer to convert the scientific notation back to a float if possible. 这个问题似乎表明我可以指定可能会引发错误的已知数字,但如果可能的话,我更愿意将科学记数法转换回浮点数。
EDITED TO SHOW DATA FROM CSV AS REQUESTED IN COMMENTS编辑以在评论中显示来自 CSV 的数据
7425616,12375,28,2015-08-09 11:07:56,0,-8.18644,118.21463,2,0,2
7425615,12375,28,2015-08-09 11:04:15,0,-8.18644,118.21463,2,NaN,2
7425617,12375,28,2015-08-09 11:09:38,0,-8.18644,118.2145,2,0.14,2
7425592,12375,28,2015-08-09 10:36:34,0,-8.18663,118.2157,2,0.05,2
65999,1021,29,2015-01-30 21:43:26,0,-8.36728,118.29235,1,0.206836151554794,2
204958,1160,30,2015-02-03 17:53:37,2,-8.36247,118.28664,1,9.49242000872744e-05,7
384739,,32,2015-01-14 16:07:02,1,-8.36778,118.29206,2,Infinity,4
275929,1160,30,2015-02-17 03:13:51,1,-8.36248,118.28656,1,113.318511172611,5
It's hard to say without seeing your data but it seems that problem in your rows that they contain something else except for numbers and 'n/a' values.很难说没有看到您的数据,但似乎您的行中的问题是它们包含除数字和“n/a”值之外的其他内容。 You could load your dataframe and then convert it to numeric as show in answers for that question.
您可以加载数据框,然后将其转换为数字,如该问题的答案所示。 If you have pandas version >=
0.17.0
then you could use following:如果您的熊猫版本 >=
0.17.0
那么您可以使用以下内容:
df1 = df.apply(pd.to_numeric, args=('coerce',))
Then you could drop row with NA values with dropna
or fill them with zeros with fillna
然后,你可以用NA值下降一行
dropna
或用零填补他们fillna
I realised it was the infinity
statement causing the issue in my data.我意识到这是导致我的数据出现问题的
infinity
语句。 Removing this with a find and replace worked.通过查找和替换来删除它。
@Anton Protopopov answer also works as did @DSM's comment regarding me not typing df['speed'] = df['speed'].astype(float)
. @Anton Protopopov 的回答也和@DSM 关于我没有输入
df['speed'] = df['speed'].astype(float)
。
Thanks for the help.谢谢您的帮助。
就我而言,使用pandas.round()有效。
df['column'] = df['column'].round(2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.