简体   繁体   English

从数据框列中的字符串中删除数字

[英]Removing numbers from strings in a Data frame column

I want to remove numbers from strings in a column, while at the same time keeping numbers that do not have any strings in the same column. 我想从一列中的字符串中删除数字,同时在同一列中保留没有任何字符串的数字。 This is how the data looks like; 这就是数据的样子;

df=
id       description
1         XG154LU
2         4562689
3         556
4         LE896E
5         65KKL4

This is how i want the output to look like: 这就是我希望输出看起来像这样的方式:

id       description
1         XGLU
2         4562689
3         556
4         LEE
5         KKL

I used the code below but when i run it it removes all the entries in the description column and replace it with blanks: 我使用了下面的代码,但是当我运行它时,它将删除描述列中的所有条目,并将其替换为空格:

def clean_text_round1(text):
  text = re.sub('\w*\d\w*', '', text)
  text = re.sub('[‘’“”…]', '', text)
  text = re.sub(r'\n', '', text)
  text = re.sub(r'\r', '', text)
return text

round1 = lambda x: clean_text_round1(x)
df['description'] = df['description'].apply(round1)

Try: 尝试:

import numpy as np

df['description'] = np.where(df.description.str.contains('^\d+$'), df.description, df.description.str.replace('\d+', ''))

Output: 输出:

id       description
1         XGLU
2         4562689
3         556
4         LEE
5         KKL

Logic: 逻辑:

Look if the string contains digits only, if yes, dont do anything and just copy the number as it is. 查看字符串是否仅contains数字,如果是,则不执行任何操作,仅复制数字即可。 If the string has numbers mixed with string, then replace them with black space '' leaving out only characters without the numbers. 如果字符串中的数字与字符串混合在一起,则replace空格'' replace它们,只保留不带数字的字符。

This should solve it for you. 这应该为您解决。

def clean_text_round1(text):
    if type(text) == int:
        return text
    else:
        text = ''.join([i for i in text if not i.isdigit()])
        return text

df['description'] = df['description'].apply(clean_text_round1)

Let me know if this works for you. 让我知道这是否适合您。 Not sure about the speed performance. 不确定速度性能。 You can use regex instead of join. 您可以使用正则表达式代替加入。

def convert(v):
    # check if the string is composed of not only numbers
    if any([char.isalpha() for char in v]):     
        va = [char for char in v if char.isalpha()]
        va = ''.join(va)
        return va 
    else:        
        return v
# apply() a function for a single column
df['description']= df['description'].apply(convert)
print(df)

id  description
0        XGLU
1     4562689
2         556
3         LEE
4         KKL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM