如果行包含 pandas 中的特定字符串，如何跳过拆分？

Question

我有一列包含 state 和国家名称：

Name   Region       Value_1 etc.
Apple  Penn State    5641561
Apple  Boston State   21515151
Apple  United States  5545645
etc.

我想在空格（“”）之后删除字符串，但我想保持美国原样。

例如：

Name   Region       Value_1 etc.
Apple  Penn          5641561
Apple  Boston         21515151
Apple  United States  5545645
etc.

我怎样才能做到这一点？ 我正在使用以下代码进行拆分： df['Region'] = df['Region'].str.split(' ').str[0]

Answer 1

IIUC，您可以使用Series.str.replace将系列中出现的模式替换为替换字符串：

df['Region'] = df['Region'].str.replace(r'(\sState)\b', '')

结果：

# print(df)

    Name         Region   Value_1
0  Apple           Penn   5641561
1  Apple         Boston  21515151
2  Apple  United States   5545645

Answer 2

尝试这个：

df = pd.DataFrame({'Name': ['Apple', 'Apple', 'Apple'], 'Region': ['Penn State', 'Boston State', 'United States']})
df['Region'] = df['Region'].apply(lambda x: x.replace('State', '') if x.split()[-1].strip() == 'State' else x)

Output：


    Name    Region
0   Apple   Penn
1   Apple   Boston
2   Apple   United States

Answer 3

替代使用np.where() ：

### Create DataFrame
df = pd.DataFrame({
'Name': ['Apple', 'Apple', 'Apple'],
'Region': ['Penn State', 'Boston State', 'United States'],
'Value_1': [5641561, 21515151, 554564]
})

### Using np.where()
df['Region'] = df['Region'].where(df['Region'].str.contains('United States'), 
                            df['Region'].str.split(" ").str[0])

### Output
print(df)

    Name         Region   Value_1
0  Apple           Penn   5641561
1  Apple         Boston  21515151
2  Apple  United States    554564

如果行包含 pandas 中的特定字符串，如何跳过拆分？

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-05-25 11:05:37

解决方案2
0 2020-05-25 10:48:06

解决方案3
0 2020-05-25 11:29:55

如果行包含 pandas 中的特定字符串，如何跳过拆分？

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-05-25 11:05:37

解决方案2 0 2020-05-25 10:48:06

解决方案3 0 2020-05-25 11:29:55

解决方案1
1 已采纳 2020-05-25 11:05:37

解决方案2
0 2020-05-25 10:48:06

解决方案3
0 2020-05-25 11:29:55