[英]How should I remove special characters from data frame except space
I am reading excel file (specific one sheet), it looks very much like this. 我正在读取excel文件(特定的一张纸),看起来非常像这样。 I would like to remove all the numbers, underscore and hyphens under 'Org' columns.
我想删除“组织”列下的所有数字,下划线和连字符。 Output under 'Org' should be
ddc systems
and so on. “组织”下的输出应为
ddc systems
,依此类推。
Name Org
0 abc 14_ddc_-_systems
1 sdc 14_ddc_-_systems
2 csc 14_ddd_-_systems
3 rdc 23_kbf_org
4 rfc 23_kbf_org
I tried below to remove numbers but it's not working .. 我在下面尝试删除数字,但是它不起作用..
s = sheet1['Org'].head()
s = s.replace('\d+\s', '')
Any help will be appreciated.! 任何帮助将不胜感激。!
You can use str.replace
with regex. 您可以将
str.replace
与正则表达式一起使用。
Ex: 例如:
import pandas as pd
df = pd.DataFrame({"Org": ["14_ddc_-_systems", "14_ddc_-_systems", "23_kbf_org"]})
df["New"] = df["Org"].str.replace(r"[^a-zA-Z ]+", " ").str.strip()
print(df)
Output: 输出:
Org New
0 14_ddc_-_systems ddc systems
1 14_ddc_-_systems ddc systems
2 23_kbf_org kbf org
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.