Python正则表达式从列表中删除除字符串外的所有内容

Question

I have string: 我有字符串：

bdv. mot. g. vns. kilm.

And knowing list of strings like 而且知道像这样的字符串列表

important_strings_lst=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']

I want to get regex selection like: 我想要像这样的regex选择：

bdv. mot. g.

I joined list and tried: idea from here 我加入列表并尝试过：从这里开始的想法

regex = re.compile(r'\b(?!bdv.|dktv.|mot. g.|vyr. g.)\w+', re.UNICODE)
regex.sub("", 'bdv. mot. g. vns. kilm.')

Got 得到

'bdv. mot. . . .'

Changing places in regex with \\s also didn't work out. 用\\s更改正则表达式\\s也无法解决。 How to do it? 怎么做？

I could use something like [x for x in important_strings_lst if x in my_string] but I need good performance as this will be used with million rows of pandas dataframe with str.replace 我可以使用像[x for x in important_strings_lst if x in my_string]但我需要有良好的表现，因为这将有600万行数据框大熊猫与使用str.replace

Answer 1

The . 的. character has special meaning in regular expressions. 字符在正则表达式中具有特殊含义。 You can use re.escape to make a string "safe" for use in a regular expression. 您可以使用re.escape使字符串“ safe”在正则表达式中使用。

>>> import re
... important_strings=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
... regex = re.compile('|'.join(re.escape(s) for s in important_strings))
... regex.findall('bdv. mot. g. vns. kilm.')
['bdv.', 'mot. g.']

Pandas has its own findall which should work like re.findall 熊猫有自己的findall ，应该像re.findall一样re.findall

Answer 2

Maybe split string 也许分割字符串

    bdv. mot. g. vns. kilm.

using your list and remove from oryginal string what left after spliting. 使用您的列表并从原始字符串中删除拆分后剩下的内容。

Python正则表达式从列表中删除除字符串外的所有内容

问题描述

2 个解决方案

解决方案1
0 已采纳 2018-11-10 17:51:10

解决方案2
0 2018-11-10 18:08:25

Python正则表达式从列表中删除除字符串外的所有内容

问题描述

2 个解决方案

解决方案1 0 已采纳 2018-11-10 17:51:10

解决方案2 0 2018-11-10 18:08:25

解决方案1
0 已采纳 2018-11-10 17:51:10

解决方案2
0 2018-11-10 18:08:25