[英]Pandas astype throwing invalid literal for int() with base 10 error
I have a pandas dataframe df
whose column name and dtypes are specified in another file (read as data_dict
).我有一个 pandas dataframe
df
,其列名和 dtypes 在另一个文件中指定(读取为data_dict
)。 So to get the data properly I am using the below code:因此,为了正确获取数据,我使用以下代码:
col_list = data_dict['name'].tolist()
dtype_list = data_dict['type'].tolist()
dtype_dict = {col_list[i]: dtype_list[i] for i in range(len(col_list))}
df.columns = col_list
df = df.fillna(0)
df = df.astype(dtype_dict)
But it is throwing this error:但它抛出了这个错误:
invalid literal for int() with base 10: '2.230'
基数为 10 的 int() 的无效文字:'2.230'
Most of the answers I searched online recommended using pd.to_numeric()
or something like df[col1].astype(float).astype(int)
.我在网上搜索的大多数答案都推荐使用
pd.to_numeric()
或类似df[col1].astype(float).astype(int)
的东西。 The issue here is that df
contains 50+ columns out of which around 30 should be converted to integer type.这里的问题是
df
包含 50+ 列,其中大约 30 列应转换为 integer 类型。 Therefore I don't want to convert the data types one column at a time.因此,我不想一次将数据类型转换为一列。
So how can I easily fix this error?那么我怎样才能轻松修复这个错误呢?
Try via boolean masking:通过 boolean 屏蔽尝试:
mask=df.apply(lambda x:x.str.isalpha(),1).fillna(False)
Finally:最后:
df[~mask]=df[~mask].astype(float).astype(int)
Or或者
cols=df[~mask].dropna(axis=1).columns
df[cols]=df[cols].astype(float).astype(int)
df[col_list] = pd.to_numeric(df[col_list])
You can set the data type of the whole dataframe like this:您可以像这样设置整个 dataframe 的数据类型:
import pandas as pd
df = pd.DataFrame({'A': map(str, np.random.rand(10)), 'B': np.random.rand(10)})
df.apply(pd.to_numeric)
A B
0 0.493771 0.389934
1 0.991265 0.387819
2 0.398947 0.128031
3 0.869156 0.007609
4 0.129748 0.532235
5 0.993632 0.882933
6 0.244311 0.213737
7 0.773192 0.229257
8 0.392530 0.339418
9 0.732609 0.685258
and for just some columns like this:对于一些这样的列:
df[['A', 'B']] = df[['A', 'B']].apply(pd.to_numeric)
In case you want to have a way to convert types to float for whole dataframe where you do not know which column has numbers, you can use this:如果你想有一种方法将整个 dataframe 的类型转换为浮点数,而你不知道哪一列有数字,你可以使用这个:
import pandas as pd
df = pd.DataFrame({'A': map(str, np.random.rand(10)), 'B': np.random.rand(10), 'C': [x for x in 'ABCDEFGHIJ']})
def to_num(df):
for col in df:
try:
df[col] = pd.to_numeric(df[col])
except:
continue
return df
df.pipe(to_num)
A B C
0 0.762027 0.095877 A
1 0.647066 0.931435 B
2 0.016939 0.806675 C
3 0.260255 0.346676 D
4 0.561694 0.551960 E
5 0.561363 0.675580 F
6 0.312432 0.498806 G
7 0.353007 0.203697 H
8 0.418549 0.128924 I
9 0.728632 0.600307 J
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.