Pandas astype 为 int() 抛出无效文字，基数为 10 错误

Question

I have a pandas dataframe df whose column name and dtypes are specified in another file (read as data_dict ).我有一个 pandas dataframe df ，其列名和 dtypes 在另一个文件中指定（读取为data_dict ）。 So to get the data properly I am using the below code:因此，为了正确获取数据，我使用以下代码：

col_list = data_dict['name'].tolist()
dtype_list = data_dict['type'].tolist()
dtype_dict = {col_list[i]: dtype_list[i] for i in range(len(col_list))}
df.columns = col_list
df = df.fillna(0)
df = df.astype(dtype_dict)

But it is throwing this error:但它抛出了这个错误：

invalid literal for int() with base 10: '2.230'基数为 10 的 int() 的无效文字：'2.230'

Most of the answers I searched online recommended using pd.to_numeric() or something like df[col1].astype(float).astype(int) .我在网上搜索的大多数答案都推荐使用pd.to_numeric()或类似df[col1].astype(float).astype(int)的东西。 The issue here is that df contains 50+ columns out of which around 30 should be converted to integer type.这里的问题是df包含 50+ 列，其中大约 30 列应转换为 integer 类型。 Therefore I don't want to convert the data types one column at a time.因此，我不想一次将数据类型转换为一列。

So how can I easily fix this error?那么我怎样才能轻松修复这个错误呢？

Answer 1

Try via boolean masking:通过 boolean 屏蔽尝试：

mask=df.apply(lambda x:x.str.isalpha(),1).fillna(False)

Finally:最后：

df[~mask]=df[~mask].astype(float).astype(int)

Or或者

cols=df[~mask].dropna(axis=1).columns
df[cols]=df[cols].astype(float).astype(int)

Answer 2

df[col_list] = pd.to_numeric(df[col_list])

Answer 3

You can set the data type of the whole dataframe like this:您可以像这样设置整个 dataframe 的数据类型：

import pandas as pd
df = pd.DataFrame({'A': map(str, np.random.rand(10)), 'B': np.random.rand(10)})
df.apply(pd.to_numeric)

          A         B
0  0.493771  0.389934
1  0.991265  0.387819
2  0.398947  0.128031
3  0.869156  0.007609
4  0.129748  0.532235
5  0.993632  0.882933
6  0.244311  0.213737
7  0.773192  0.229257
8  0.392530  0.339418
9  0.732609  0.685258

and for just some columns like this:对于一些这样的列：

df[['A', 'B']] = df[['A', 'B']].apply(pd.to_numeric)

In case you want to have a way to convert types to float for whole dataframe where you do not know which column has numbers, you can use this:如果你想有一种方法将整个 dataframe 的类型转换为浮点数，而你不知道哪一列有数字，你可以使用这个：

import pandas as pd
df = pd.DataFrame({'A': map(str, np.random.rand(10)), 'B': np.random.rand(10), 'C': [x for x in 'ABCDEFGHIJ']})

def to_num(df):
    for col in df:
        try:
            df[col] = pd.to_numeric(df[col])
        except:
            continue
    return df

df.pipe(to_num)

          A         B  C
0  0.762027  0.095877  A
1  0.647066  0.931435  B
2  0.016939  0.806675  C
3  0.260255  0.346676  D
4  0.561694  0.551960  E
5  0.561363  0.675580  F
6  0.312432  0.498806  G
7  0.353007  0.203697  H
8  0.418549  0.128924  I
9  0.728632  0.600307  J

Pandas astype 为 int() 抛出无效文字，基数为 10 错误

问题描述

3 个解决方案

解决方案1
2 已采纳 2021-06-01 15:25:18

解决方案2
0 2021-06-01 15:20:35

解决方案3
0 2021-06-01 15:26:19

Pandas astype 为 int() 抛出无效文字，基数为 10 错误

问题描述

3 个解决方案

解决方案1 2 已采纳 2021-06-01 15:25:18

解决方案2 0 2021-06-01 15:20:35

解决方案3 0 2021-06-01 15:26:19

解决方案1
2 已采纳 2021-06-01 15:25:18

解决方案2
0 2021-06-01 15:20:35

解决方案3
0 2021-06-01 15:26:19