简体   繁体   English

无法从 Pandas dataframe 中的列中删除非数字值

[英]Unable to strip non-numeric values from column in Pandas dataframe

I'm working on cleaning and EDA of a time series dataset of revenues.我正在对收入的时间序列数据集进行清理和 EDA。 For some of the entries, the values are prefaced with an '(R) ' meaning the value has been revised, and is shown like (R) 1000. Example:对于某些条目,值以“(R)”开头,表示该值已被修改,并显示为 (R) 1000。示例:

df = pd.DataFrame({
    'year': ['2005', '2006', '2007'], 
    'revenue': [500, (R) 1000, 2200]})

Strangely, the data type for this column is still showing as float64 and works when compiling a lineplot.奇怪的是,该列的数据类型仍然显示为 float64,并且在编译线图时有效。 In the original Excel spreadsheet, when going to highlight a particular cell, the (R) disappears and only displays the numerical value.在原始的 Excel 电子表格中,当要突出显示特定单元格时,(R)消失并且只显示数值。

I have developed a working code as follows:我开发了一个工作代码如下:

df['revenue'] = df['revenue'].replace('(R) ','', regex=True)

This code does not return any errors, but it is unsuccessful in removing the (R) values from this column when looking at the dataframe.此代码不返回任何错误,但在查看 dataframe 时,从该列中删除 (R) 值是不成功的。 This (R) seems to work as some kind of placeholder, but I cannot figure out how to remove it, and it conflicts with my other data.这个 (R) 似乎可以用作某种占位符,但我不知道如何删除它,并且它与我的其他数据冲突。

Basically, I just want to change values such as (R) 1000 to 1000基本上,我只想将 (R) 1000 等值更改为 1000

Assuming:假设:

df = pd.DataFrame({
    'year': ['2005', '2006', '2007'], 
    'revenue': [500, '(R) 1000', 2200]})

You can use:您可以使用:

df['revenue'] = (df['revenue'].str.extract('(\d+)$', expand=False)
                 .fillna(df['revenue'])
                 .astype(int)
                 )

Output: Output:

   year  revenue
0  2005      500
1  2006     1000
2  2007     2200

previous answer上一个答案

Use pandas.to_numeric :使用pandas.to_numeric

df['revenue'] = pd.to_numeric(df['revenue'], errors='coerce')

To replace with a given value, combine with fillna :要替换为给定值,请与fillna结合使用:

df['revenue'] = pd.to_numeric(df['revenue'], errors='coerce').fillna(1000)

This should remove all letters and parenthesis from your strings这应该从您的字符串中删除所有字母和括号

df['revenue'].replace('[A-Za-z)(]','',regex=True).astype(int)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM