简体   繁体   English

正则表达式替换Python

[英]Regular expression substitution Python

I am having trouble with this one. 我对此有麻烦。 I am trying to get a better handle on RE but it is not working. 我正在尝试更好地处理RE,但无法正常工作。 I have a list of strings that I want to erase if they are found in another string. 如果有另一个字符串,我有一个要删除的字符串列表。

this is the exclusion list: 这是排除列表:

exclusionList = ['\+','of','<ET>f.','to','the','<L>L.</L>','f.','in','and','see','a','<L>Fr.</L>','as','<ET>ad.','<ET>a.','<PS>v.</PS></XR>',
             'from','<CF>ab</CF>','or','n.','<L>OFr.</L>','pple.','away','was','with','off,','pa.','on','is','cf.','stem','ad.','which',
             'by','action','ppl.','Cf.','but','<L>Gr.</L>','be','after','=','The','form','for','an','<XR><RX>prec.</RX></XR>',
             '<PS>a.</PS></XR>','<L>Eng.</L>','<PS>pref.</PS>','also','L.</L>','<XR><XL>a-</XL>','<XR><XL>-ing</XL><HO>1</HO></XR>.</ET>',
             'vb.','See','In','<L>OE.</L>','used','it','see','this','not','<PS>prep.</PS><HO>1</HO></XR>','has','a','so','early','s']

And this is what I am using to remove those words: 这就是我用来删除这些单词的内容:

first_word = re.sub(r'\b'+exclusionList[a]+'\b', '',first_word)

where first word is a string read from a text file. 其中第一个单词是从文本文件读取的字符串。 I know this is going to be simple but I just do not quite get how to use RE very well. 我知道这将很简单,但我只是不太了解如何很好地使用RE。

Thanks 谢谢

I can only guess, but probably you want something like this: 我只能猜测,但可能您想要这样的东西:

pattern = r'\b({})\b'.format('|'.join(map(re.escape, exclusionList)))
first_word = re.sub(pattern, '', first_word)

Note that I'm escaping the words, so they will be matched literally, instead of being interpreted as regular expressions (which they don't seem to be). 请注意,我在对单词进行转义,因此它们将在字面上进行匹配,而不是被解释为正则表达式(它们似乎不是正则表达式)。

This should do the trick all at once: 这应该一下子就完成了:

exclusionRegex = r'\b(' + '|'.join(re.escape(word) for word in exclusionList) + r')\b'
first_word = re.sub(exclusionRegex, '', first_word)

EDIT : This is my test script: 编辑 :这是我的测试脚本:

import re

exclusionList = ['\+','of','<ET>f.','to','the','<L>L.</L>','f.','in','and','see','a','<L>Fr.</L>','as','<ET>ad.','<ET>a.','<PS>v.</PS></XR>',
             'from','<CF>ab</CF>','or','n.','<L>OFr.</L>','pple.','away','was','with','off,','pa.','on','is','cf.','stem','ad.','which',
             'by','action','ppl.','Cf.','but','<L>Gr.</L>','be','after','=','The','form','for','an','<XR><RX>prec.</RX></XR>',
             '<PS>a.</PS></XR>','<L>Eng.</L>','<PS>pref.</PS>','also','L.</L>','<XR><XL>a-</XL>','<XR><XL>-ing</XL><HO>1</HO></XR>.</ET>',
             'vb.','See','In','<L>OE.</L>','used','it','see','this','not','<PS>prep.</PS><HO>1</HO></XR>','has','a','so','early','s']

exclusionRegex = r'\b(' + '|'.join(re.escape(word) for word in exclusionList) + r')\b'
first_word = 'This is a test of the regex'
print re.sub(exclusionRegex, '', first_word)

And this is the output: 这是输出:

This test regex 这个测试正则表达式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM