[英]Regex to remove specific parts of a string in a column dataframe python
您可以使用一些正则表达式来获得所需的结果。
df = pd.DataFrame({"address": ["Street Pases de la Reforma #200 REFERENCE: Green house", "Street Carranza #300 12 & 13 REFERENCE: There is a tree"]})
df.address.str.findall(r".+?(?=REFERENCE)").explode()
0 Street Pases de la Reforma #200
1 Street Carranza #300 12 & 13
正则表达式模式的解释:
.+? matches any character (except for line terminators)
+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=REFERENCE)
正则表达式应如下所示:
import re
discard_re = re.compile('(reference:.*)', re.IGNORECASE | re.MULTILINE)
然后您可以添加新列:
df['address_new'] = df.addresses.map(lambda x: discard_re.sub('', x))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.