用 '()' 替换部分字符串

Question

我需要替换一些在 dataframe 的“国家”列中具有“（）”或数字的国家名称。 例如，“玻利维亚（多民族 State of）”应为“玻利维亚”。 'Switzerland17' 应该是 'Switzerland'。

我正在使用下面的代码

df3['Country'] = df3['Country'].str.replace(r'[^(][\w]*[)]','')
df3['Country'] = df3['Country'].str.replace(r'[\d]*','')

我在哪里出错了，你能帮忙吗

Answer 1

您可以单次替换括号或数字之间的文本：

\s*(?:\([^()]*\)|\d+)

解释

\s*匹配 0+ 个空格字符
(?:非捕获组
- \([^()]*\)|\d+匹配从( .. until.. )或匹配 1+ 位
)关闭非捕获组

正则表达式演示

df3['Country'] = df3['Country'].str.replace(r'\s*(?:\([^()]*\)|\d+)', '')

Output

       Country
0      Bolivia
1  Switzerland

Answer 2

你应该使用

df3['Country'].str.replace(r"\s*(?:\d+|\([^()]*\))","").str.strip()

请参阅正则表达式演示。 细节：

\s* - 零个或多个空格
(?:\d+|\([^()]*\)) - 一个或多个数字，或( ，然后是(和)以外的零个或多个字符，然后是 a )

.str.strip()如果匹配恰好在开头并且后面是空格，则必须使用 .str.strip() 。

参见 Pandas 测试：

>>> import pandas as pd
>>> df3 = pd.DataFrame({'Country':['Bolivia (Plurinational State of)','Switzerland17','(Republic of) Korea']})
>>> df3['Country'].str.replace(r"\s*(?:\d+|\([^()]*\))","").str.strip()
0        Bolivia
1    Switzerland
2          Korea
Name: Country, dtype: object

Answer 3

我会使用以下模式：'([^)] )|[\d. ]' 的| 字符让您在一行中使用多个模式。

df = pd.DataFrame({'Country':['Bolivia (Plurinational State of)','Switzerland17']})

原始df：

    Country
0   Bolivia (Plurinational State of)
1   Switzerland17

建议代码：

df['Country'] = df['Country'].str.replace(r'\([^)]*\)|[\d.*]','',regex=True)

输出：

    Country
0   Bolivia
1   Switzerland

用 '()' 替换部分字符串

问题描述

3 个解决方案

解决方案1
1 2020-12-14 11:39:54

解决方案2
1 2020-12-14 12:08:53

解决方案3
0 2020-12-14 11:41:08

用 '()' 替换部分字符串

问题描述

3 个解决方案

解决方案1 1 2020-12-14 11:39:54

解决方案2 1 2020-12-14 12:08:53

解决方案3 0 2020-12-14 11:41:08

解决方案1
1 2020-12-14 11:39:54

解决方案2
1 2020-12-14 12:08:53

解决方案3
0 2020-12-14 11:41:08