并非所有单元格都被 pandas.to_numeric 更改

Question

I have a dataframe with 30 columns and 1000 rows.我有一个 dataframe 有 30 列和 1000 行。 It is read using:阅读使用：

df = pd.read_excel(filepath, sheet_name = "sheetname", header = 2)

Within the excel floats are comma seperated, but since they are recognized by excel as numbers this is not a problem.在 excel 中，浮点数以逗号分隔，但由于它们被 excel 识别为数字，所以这不是问题。 Unfortunatelty some cells need more attantion.不幸的是，有些细胞需要更多的关注。

In order to understand the data I printed all string values.为了理解数据，我打印了所有字符串值。

print(df['Column_name'][df['Column_name'].apply(lambda x: type(x) == str)])

returns:回报：

862     10-12,5
863     10-12,5
864     10-12,5
865     10-12,5
866     10-12,5
867     10-12,5
868     10-12,5
1129      8-12
1130      8-12
1131      8-12
1132      8-12
1133      8-12
Name: Column_name, dtype: object

Adding the following lines returns:添加以下行返回：

df['Column_name'] = df['Column_name'].str.split('-').str[-1]
df['Column_name'] = pd.to_numeric(df['Column_name'], errors='ignore')
print(df['Column_name'][df['Column_name'].apply(lambda x: type(x) == str)])

still returns仍然返回

862       12,5
863       12,5
864       12,5
865       12,5
866       12,5
867       12,5
868       12,5
1129        12
1130        12
1131        12
1132        12
1133        12
Name: Column_name, dtype: object

Why are they still strings?为什么它们仍然是字符串？ I do understand why the ones still containing a comma have not changed, but the other ones I do not get.我确实理解为什么仍然包含逗号的那些没有改变，但我没有得到其他的。

Also I tried, using我也尝试过，使用

df['Column_name'] = df['Column_name'].apply(lambda x: str(x).replace(',','.'))

But it messes up all the values that are already floats and everything becomes nan但是它弄乱了所有已经浮动的值，一切都变成了 nan

Answer 1

Because is used errors='ignore' in to_numeric - if there is error values are returned with no converting.因为在to_numeric中使用了errors='ignore' - 如果有错误值，则不进行转换就返回。

If 'ignore', then invalid parsing will return the input.如果“忽略”，则无效解析将返回输入。

So use errors='coerce' for missing values if cannot convert with replace , to .因此,如果不能用 replace 转换为errors='coerce'缺失值. first:第一的：

df['Column_name'] = df['Column_name'].str.split('-').str[-1].str.replace(",", ".")
df['Column_name'] = pd.to_numeric(df['Column_name'], errors='coerce')

EDIT:编辑：

If possible some trailing whitespaces first remove them:如果可能，一些尾随空格首先删除它们：

df['Column_name'] = (df['Column_name'].astype(str)
                                      .str.strip()
                                      .str.split('-')
                                      .str[-1]
                                      .str.replace(",", "."))
df['Column_name'] = pd.to_numeric(df['Column_name'], errors='coerce')

Answer 2

If you have dataframe:如果您有 dataframe：

     Column_name
862      10-12,5
863      10-12,5
864      10-12,5
865      10-12,5
866      10-12,5
867      10-12,5
868      10-12,5
1129        8-12
1130        8-12
1131        8-12
1132        8-12
1133        8-12

Then:然后：

df["Column_name"] = (
    df["Column_name"].str.split("-").str[-1].str.replace(",", ".").astype(float)
)
print(df)

Prints:印刷：

      Column_name
862          12.5
863          12.5
864          12.5
865          12.5
866          12.5
867          12.5
868          12.5
1129         12.0
1130         12.0
1131         12.0
1132         12.0
1133         12.0

并非所有单元格都被 pandas.to_numeric 更改

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-06-17 10:33:09

解决方案2
1 2021-06-17 10:33:49

并非所有单元格都被 pandas.to_numeric 更改

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-06-17 10:33:09

解决方案2 1 2021-06-17 10:33:49

解决方案1
1 已采纳 2021-06-17 10:33:09

解决方案2
1 2021-06-17 10:33:49