[英]How can I remove special characters for just one column in a data frame?
I am trying to clean my data frame but I just want to remove special characters for just one column.我正在尝试清理我的数据框,但我只想删除一列的特殊字符。 (Please refer the figure below) (请参考下图)
df 1 df 1
| A | B | C |
|---------|----––|––----|
| Ags(1) | 5 | 4 |
| Cdmx(2) | 6 | 6 |
|Leon(4) | 90 | 45 |
|
What I want to remove is just the numbers and special characters of the column A我要删除的只是A列的数字和特殊字符
This is what I tried:这是我尝试过的:
df = re.sub('[^A-Za-z0-9]+', '', df1["A"])
>> expected string or bytes-like object
我会尝试在想要的列上使用带有 apply 函数的 lambda。
df1["A"] = df1["A"].apply(lambda x: re.sub('[^A-Za-z0-9]+', '', x))
You can also use .str.extract()
to keep the part you want (vs replace, which eliminates the part you don't want):您还可以使用.str.extract()
保留您想要的部分(与替换相比,它消除了您不想要的部分):
from io import StringIO
import pandas as pd
data = ''' A B C
Ags(1) 5 4
Cdmx(2) 6 6
Leon(4) 90 45
'''
df = pd.read_csv(StringIO(data), sep='\s\s+', engine='python')
df['A'] = df['A'].str.extract(r'(\w+)', expand=False)
print(df)
A B C
0 Ags 5 4
1 Cdmx 6 6
2 Leon 90 45
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.