[英]Not all cells changed by pandas.to_numeric
I have a dataframe with 30 columns and 1000 rows.我有一个 dataframe 有 30 列和 1000 行。 It is read using:阅读使用:
df = pd.read_excel(filepath, sheet_name = "sheetname", header = 2)
Within the excel floats are comma seperated, but since they are recognized by excel as numbers this is not a problem.在 excel 中,浮点数以逗号分隔,但由于它们被 excel 识别为数字,所以这不是问题。 Unfortunatelty some cells need more attantion.不幸的是,有些细胞需要更多的关注。
In order to understand the data I printed all string values.为了理解数据,我打印了所有字符串值。
print(df['Column_name'][df['Column_name'].apply(lambda x: type(x) == str)])
returns:回报:
862 10-12,5
863 10-12,5
864 10-12,5
865 10-12,5
866 10-12,5
867 10-12,5
868 10-12,5
1129 8-12
1130 8-12
1131 8-12
1132 8-12
1133 8-12
Name: Column_name, dtype: object
Adding the following lines returns:添加以下行返回:
df['Column_name'] = df['Column_name'].str.split('-').str[-1]
df['Column_name'] = pd.to_numeric(df['Column_name'], errors='ignore')
print(df['Column_name'][df['Column_name'].apply(lambda x: type(x) == str)])
still returns仍然返回
862 12,5
863 12,5
864 12,5
865 12,5
866 12,5
867 12,5
868 12,5
1129 12
1130 12
1131 12
1132 12
1133 12
Name: Column_name, dtype: object
Why are they still strings?为什么它们仍然是字符串? I do understand why the ones still containing a comma have not changed, but the other ones I do not get.我确实理解为什么仍然包含逗号的那些没有改变,但我没有得到其他的。
Also I tried, using我也尝试过,使用
df['Column_name'] = df['Column_name'].apply(lambda x: str(x).replace(',','.'))
But it messes up all the values that are already floats and everything becomes nan但是它弄乱了所有已经浮动的值,一切都变成了 nan
Because is used errors='ignore'
in to_numeric
- if there is error values are returned with no converting.因为在to_numeric
中使用了errors='ignore'
- 如果有错误值,则不进行转换就返回。
If 'ignore', then invalid parsing will return the input.如果“忽略”,则无效解析将返回输入。
So use errors='coerce'
for missing values if cannot convert with replace ,
to .
因此,
如果不能用 replace 转换为errors='coerce'
缺失值.
first:第一的:
df['Column_name'] = df['Column_name'].str.split('-').str[-1].str.replace(",", ".")
df['Column_name'] = pd.to_numeric(df['Column_name'], errors='coerce')
EDIT:编辑:
If possible some trailing whitespaces first remove them:如果可能,一些尾随空格首先删除它们:
df['Column_name'] = (df['Column_name'].astype(str)
.str.strip()
.str.split('-')
.str[-1]
.str.replace(",", "."))
df['Column_name'] = pd.to_numeric(df['Column_name'], errors='coerce')
If you have dataframe:如果您有 dataframe:
Column_name
862 10-12,5
863 10-12,5
864 10-12,5
865 10-12,5
866 10-12,5
867 10-12,5
868 10-12,5
1129 8-12
1130 8-12
1131 8-12
1132 8-12
1133 8-12
Then:然后:
df["Column_name"] = (
df["Column_name"].str.split("-").str[-1].str.replace(",", ".").astype(float)
)
print(df)
Prints:印刷:
Column_name
862 12.5
863 12.5
864 12.5
865 12.5
866 12.5
867 12.5
868 12.5
1129 12.0
1130 12.0
1131 12.0
1132 12.0
1133 12.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.