简体   繁体   English

Python正则表达式从列表中删除除字符串外的所有内容

[英]Python regex remove everything except strings from list

I have string: 我有字符串:

bdv. mot. g. vns. kilm.

And knowing list of strings like 而且知道像这样的字符串列表

important_strings_lst=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']

I want to get regex selection like: 我想要像这样的regex选择:

bdv. mot. g.

I joined list and tried: idea from here 我加入列表并尝试过: 从这里开始的想法

regex = re.compile(r'\b(?!bdv.|dktv.|mot. g.|vyr. g.)\w+', re.UNICODE)
regex.sub("", 'bdv. mot. g. vns. kilm.')

Got 得到

'bdv. mot. . . .'

Changing places in regex with \\s also didn't work out. \\s更改正则表达式\\s也无法解决。 How to do it? 怎么做?

I could use something like [x for x in important_strings_lst if x in my_string] but I need good performance as this will be used with million rows of pandas dataframe with str.replace 我可以使用像[x for x in important_strings_lst if x in my_string]但我需要有良好的表现,因为这将有600万行数据框大熊猫与使用str.replace

The . . character has special meaning in regular expressions. 字符在正则表达式中具有特殊含义。 You can use re.escape to make a string "safe" for use in a regular expression. 您可以使用re.escape使字符串“ safe”在正则表达式中使用。

>>> import re
... important_strings=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
... regex = re.compile('|'.join(re.escape(s) for s in important_strings))
... regex.findall('bdv. mot. g. vns. kilm.')
['bdv.', 'mot. g.']

Pandas has its own findall which should work like re.findall 熊猫有自己的findall ,应该像re.findall一样re.findall

Maybe split string 也许分割字符串

    bdv. mot. g. vns. kilm.

using your list and remove from oryginal string what left after spliting. 使用您的列表并从原始字符串中删除拆分后剩下的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM