正则表达式删除列数据框中字符串的特定部分python

Question

我正在处理一个包含地址的数据框，我想删除字符串的特定部分。 例如

我想删除字符串，因为将单词“REFERENCE：”和“reference：”放在句子的末尾。 我还想创建一个看起来像这样的新列（没有单词 REFERENCE:/reference: 和这些单词的下一个字母）你能帮我在正则表达式中做吗？ 我希望它的新列看起来像这样：

Answer 1

您可以使用一些正则表达式来获得所需的结果。

df = pd.DataFrame({"address": ["Street Pases de la Reforma #200 REFERENCE: Green house", "Street Carranza #300 12 & 13 REFERENCE: There is a tree"]})

df.address.str.findall(r".+?(?=REFERENCE)").explode()

0    Street Pases de la Reforma #200 
1       Street Carranza #300 12 & 13

正则表达式模式的解释：

.+? matches any character (except for line terminators)
+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=REFERENCE)

Answer 2

正则表达式应如下所示：

import re

discard_re = re.compile('(reference:.*)', re.IGNORECASE | re.MULTILINE)

然后您可以添加新列：

df['address_new'] = df.addresses.map(lambda x: discard_re.sub('', x))

正则表达式删除列数据框中字符串的特定部分python

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-09-23 01:57:57

解决方案2
1 2020-09-23 01:59:23

正则表达式删除列数据框中字符串的特定部分python

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-09-23 01:57:57

解决方案2 1 2020-09-23 01:59:23

解决方案1
1 已采纳 2020-09-23 01:57:57

解决方案2
1 2020-09-23 01:59:23