简体   繁体   English

Pandas astype 为 int() 抛出无效文字,基数为 10 错误

[英]Pandas astype throwing invalid literal for int() with base 10 error

I have a pandas dataframe df whose column name and dtypes are specified in another file (read as data_dict ).我有一个 pandas dataframe df ,其列名和 dtypes 在另一个文件中指定(读取为data_dict )。 So to get the data properly I am using the below code:因此,为了正确获取数据,我使用以下代码:

col_list = data_dict['name'].tolist()
dtype_list = data_dict['type'].tolist()
dtype_dict = {col_list[i]: dtype_list[i] for i in range(len(col_list))}
df.columns = col_list
df = df.fillna(0)
df = df.astype(dtype_dict)

But it is throwing this error:但它抛出了这个错误:

invalid literal for int() with base 10: '2.230'基数为 10 的 int() 的无效文字:'2.230'

Most of the answers I searched online recommended using pd.to_numeric() or something like df[col1].astype(float).astype(int) .我在网上搜索的大多数答案都推荐使用pd.to_numeric()或类似df[col1].astype(float).astype(int)的东西。 The issue here is that df contains 50+ columns out of which around 30 should be converted to integer type.这里的问题是df包含 50+ 列,其中大约 30 列应转换为 integer 类型。 Therefore I don't want to convert the data types one column at a time.因此,我不想一次将数据类型转换为一列。

So how can I easily fix this error?那么我怎样才能轻松修复这个错误呢?

Try via boolean masking:通过 boolean 屏蔽尝试:

mask=df.apply(lambda x:x.str.isalpha(),1).fillna(False)

Finally:最后:

df[~mask]=df[~mask].astype(float).astype(int)

Or或者

cols=df[~mask].dropna(axis=1).columns
df[cols]=df[cols].astype(float).astype(int)

df[col_list] = pd.to_numeric(df[col_list])

You can set the data type of the whole dataframe like this:您可以像这样设置整个 dataframe 的数据类型:

import pandas as pd
df = pd.DataFrame({'A': map(str, np.random.rand(10)), 'B': np.random.rand(10)})
df.apply(pd.to_numeric)

          A         B
0  0.493771  0.389934
1  0.991265  0.387819
2  0.398947  0.128031
3  0.869156  0.007609
4  0.129748  0.532235
5  0.993632  0.882933
6  0.244311  0.213737
7  0.773192  0.229257
8  0.392530  0.339418
9  0.732609  0.685258

and for just some columns like this:对于一些这样的列:

df[['A', 'B']] = df[['A', 'B']].apply(pd.to_numeric)

In case you want to have a way to convert types to float for whole dataframe where you do not know which column has numbers, you can use this:如果你想有一种方法将整个 dataframe 的类型转换为浮点数,而你不知道哪一列有数字,你可以使用这个:

import pandas as pd
df = pd.DataFrame({'A': map(str, np.random.rand(10)), 'B': np.random.rand(10), 'C': [x for x in 'ABCDEFGHIJ']})

def to_num(df):
    for col in df:
        try:
            df[col] = pd.to_numeric(df[col])
        except:
            continue
    return df

df.pipe(to_num)

          A         B  C
0  0.762027  0.095877  A
1  0.647066  0.931435  B
2  0.016939  0.806675  C
3  0.260255  0.346676  D
4  0.561694  0.551960  E
5  0.561363  0.675580  F
6  0.312432  0.498806  G
7  0.353007  0.203697  H
8  0.418549  0.128924  I
9  0.728632  0.600307  J

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM