[英]Remove the parenthesis area in a string pandas
I am trying this:我正在尝试这个:
There are also several countries with numbers and/or parenthesis in their name.还有几个国家的名称中带有数字和/或括号。 Be sure to remove these,
一定要删除这些,
eg例如
'Cuba (Island of Caribeas)' should be 'Cuba', '古巴(加勒比岛)'应该是'古巴',
DataFrame in DataFrame 在
Country Energy
18 Mexico 321000000
19 Cuba (Island of Caribeas) 102000000
20 Algeria 1959000000
21 American 2252661245
22 Andorra(no mentioned) 9000000
I would like to get this df (DF out)我想得到这个 df (DF out)
Country Energy
18 Mexico 321000000
19 Cuba 102000000
20 Algeria 1959000000
21 American 2252661245
22 Andorra 9000000
I am trying this我正在尝试这个
for item in df['Country']: #remove the () with the data inside
re.sub(r" ?\(\w+\)", "", item)
But I dont get any changes in my DF, and no error, so I dont know what I am doing wrong.但是我的 DF 没有任何变化,也没有错误,所以我不知道我做错了什么。 Please someone could help me?
请问有人可以帮助我吗?
This could be a start... try:这可能是一个开始......尝试:
df['Country'] = df['Country'].apply(lambda x: re.sub(r" ?\(\w+\)", "", x))
This will apply your expression to each value in df['Country']...这会将您的表达式应用于 df['Country'] 中的每个值...
The regular expression isn't quite right - what if there are white spaces in the brackets?正则表达式不太正确 - 如果括号中有空格怎么办?
import pandas as pd
s = pd.Series(['Cuba (Island of Caribeas)', 'Andorra(no mentioned)', 'Algeria'])
s.replace(r" ?\((?:\w+ ?)+\)", "", regex=True)
This will return:这将返回:
Out[13]:
0 Cuba
1 Andorra
2 Algeria
dtype: object
To adapt it to your example:使其适应您的示例:
df['Country'] = df['Country'].replace(r" ?\((?:\w+ ?)+\)", "", regex=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.