简体   繁体   English

并非所有单元格都被 pandas.to_numeric 更改

[英]Not all cells changed by pandas.to_numeric

I have a dataframe with 30 columns and 1000 rows.我有一个 dataframe 有 30 列和 1000 行。 It is read using:阅读使用:

df = pd.read_excel(filepath, sheet_name = "sheetname", header = 2)

Within the excel floats are comma seperated, but since they are recognized by excel as numbers this is not a problem.在 excel 中,浮点数以逗号分隔,但由于它们被 excel 识别为数字,所以这不是问题。 Unfortunatelty some cells need more attantion.不幸的是,有些细胞需要更多的关注。

In order to understand the data I printed all string values.为了理解数据,我打印了所有字符串值。

print(df['Column_name'][df['Column_name'].apply(lambda x: type(x) == str)])

returns:回报:

862     10-12,5
863     10-12,5
864     10-12,5
865     10-12,5
866     10-12,5
867     10-12,5
868     10-12,5
1129      8-12
1130      8-12
1131      8-12
1132      8-12
1133      8-12
Name: Column_name, dtype: object

Adding the following lines returns:添加以下行返回:

df['Column_name'] = df['Column_name'].str.split('-').str[-1]
df['Column_name'] = pd.to_numeric(df['Column_name'], errors='ignore')
print(df['Column_name'][df['Column_name'].apply(lambda x: type(x) == str)])

still returns仍然返回

862       12,5
863       12,5
864       12,5
865       12,5
866       12,5
867       12,5
868       12,5
1129        12
1130        12
1131        12
1132        12
1133        12
Name: Column_name, dtype: object

Why are they still strings?为什么它们仍然是字符串? I do understand why the ones still containing a comma have not changed, but the other ones I do not get.我确实理解为什么仍然包含逗号的那些没有改变,但我没有得到其他的。

Also I tried, using我也尝试过,使用

df['Column_name'] = df['Column_name'].apply(lambda x: str(x).replace(',','.'))

But it messes up all the values that are already floats and everything becomes nan但是它弄乱了所有已经浮动的值,一切都变成了 nan

Because is used errors='ignore' in to_numeric - if there is error values are returned with no converting.因为在to_numeric中使用了errors='ignore' - 如果有错误值,则不进行转换就返回。

If 'ignore', then invalid parsing will return the input.如果“忽略”,则无效解析将返回输入。

So use errors='coerce' for missing values if cannot convert with replace , to .因此,如果不能用 replace 转换为errors='coerce'缺失值. first:第一的:

df['Column_name'] = df['Column_name'].str.split('-').str[-1].str.replace(",", ".")
df['Column_name'] = pd.to_numeric(df['Column_name'], errors='coerce')

EDIT:编辑:

If possible some trailing whitespaces first remove them:如果可能,一些尾随空格首先删除它们:

df['Column_name'] = (df['Column_name'].astype(str)
                                      .str.strip()
                                      .str.split('-')
                                      .str[-1]
                                      .str.replace(",", "."))
df['Column_name'] = pd.to_numeric(df['Column_name'], errors='coerce')

If you have dataframe:如果您有 dataframe:

     Column_name
862      10-12,5
863      10-12,5
864      10-12,5
865      10-12,5
866      10-12,5
867      10-12,5
868      10-12,5
1129        8-12
1130        8-12
1131        8-12
1132        8-12
1133        8-12

Then:然后:

df["Column_name"] = (
    df["Column_name"].str.split("-").str[-1].str.replace(",", ".").astype(float)
)
print(df)

Prints:印刷:

      Column_name
862          12.5
863          12.5
864          12.5
865          12.5
866          12.5
867          12.5
868          12.5
1129         12.0
1130         12.0
1131         12.0
1132         12.0
1133         12.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM