[英]Regex for last words in string?
Say I have a string such as: 说我有一个字符串,例如:
Woori Finance Holdings Co Ltd
Alliance One International Inc
And I want to remove things like Co, Company, International etc regardless of case, as long as they are at the end of a string. 而且我想删除大小写(如Co,Company,International等)之类的东西,只要它们位于字符串的末尾即可。
re.compile(r'\b(Incorporated|Corporation|Company|Inc Common Stock|QQQ|ETF|PLC|SA|Inc|Corp|Ltd|LP|plc|Group|The|Co|International)$',
flags=re.IGNORECASE)
This regex manages to locate the last element of a string but how do I continue going until it hits a word that's not in the regex; 这个正则表达式设法找到一个字符串的最后一个元素,但是我要如何继续操作直到找到不在正则表达式中的单词为止。 ie, the above strings would result in: 即,以上字符串将导致:
Woori Finance Holdings
Alliance One
I also want to add that I wouldn't want to remove Company
if it were at the start or middle of a string, only if it is part of the end of a string. 我还想补充一点,我不希望删除Company
在字符串的开头或中间,只要它是字符串结尾的一部分。
You may use this regex to match 1+ to-be-removed words at the end: 您可以使用此正则表达式在末尾匹配1+个要删除的单词:
(?:\s+(?:Incorporated|Corporation|Company|Inc Common Stock|QQQ|ETF|PLC|SA|Inc|Corp|Ltd|LP|plc|Group|The|Co|International))+\s*$
For python use: 对于python使用:
regex = re.compile(r'(?:\s+(?:Incorporated|Corporation|Company|Inc Common Stock|QQQ|ETF|PLC|SA|Inc|Corp|Ltd|LP|plc|Group|The|Co|International))+\s*$', re.MULTILINE | re.IGNORECASE)
s = regex.sub('', s)
You can use re.sub
to replace last unnecessary characters: 您可以使用re.sub
替换最后一个不必要的字符:
import re
s1 = 'Woori Finance Holdings Co Ltd'
s2 = 'Alliance One International Inc'
pattern = re.compile(r'\b(Incorporated|Corporation|Company|Inc Common Stock|QQQ|ETF|PLC|SA|Inc|Corp|Co Ltd|Ltd|LP|plc|Group|The|Co|International)$', flags=re.IGNORECASE)
print(re.sub(pattern, '', s1))
# Woori Finance Holdings
print(re.sub(pattern, '', s2))
# Alliance One International
Note that I've also added 'Co Ltd'
as part of pattern to be matched. 请注意,我还添加了'Co Ltd'
作为要匹配的模式的一部分。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.