繁体   English   中英

从字符串中提取出现在关键字之前的单词/句子 - Python

[英]Extract words/sentence that occurs before a keyword from a string - Python

我有一个这样的字符串,

my_str ='·in this match, dated may 1, 2013 (the "the match") is between brooklyn centenniel, resident of detroit, michigan ("champion") and kamil kubaru, the challenger from alexandria, virginia ("underdog").'

现在,我想提取当前的championunderdog使用关键字championunderdog

这里真正具有挑战性的是两个竞争者的名字都出现在括号内的关键字之前。 我想使用正则表达式并提取信息。

以下是我所做的,

champion = re.findall(r'("champion"[^.]*.)', my_str)
print(champion)

>> ['"champion") and kamil kubaru, the challenger from alexandria, virginia ("underdog").']


underdog = re.findall(r'("underdog"[^.]*.)', my_str)
print(underdog)

>>['"underdog").']

但是,我需要结果, champion as

brooklyn centenniel, resident of detroit, michigan

underdog为:

kamil kubaru, the challenger from alexandria, virginia

我如何使用正则表达式来做到这一点? (我一直在搜索,如果我可以从关键字中返回几个或几个词以获得我想要的结果,但还没有运气)任何帮助或建议将不胜感激。

您可以使用命名捕获组来捕获所需的结果:

between\s+(?P<champion>.*?)\s+\("champion"\)\s+and\s+(?P<underdog>.*?)\s+\("underdog"\)
  • between\\s+(?P<champion>.*?)\\s+\\("champion"\\)匹配从between("champion")的块,并将所需的部分放在中间作为命名的捕获组champion

  • 之后, \\s+and\\s+(?P<underdog>.*?)\\s+\\("underdog"\\)匹配块 upto ("underdog")并再次从这里获取所需的部分作为命名的捕获组underdog

例子:

In [26]: my_str ='·in this match, dated may 1, 2013 (the "the match") is between brooklyn centenniel, resident of detroit, michigan ("champion") and kamil kubaru, the challenger from alexandria, virginia 
    ...: ("underdog").'

In [27]: out = re.search(r'between\s+(?P<champion>.*?)\s+\("champion"\)\s+and\s+(?P<underdog>.*?)\s+\("underdog"\)', my_str)

In [28]: out.groupdict()
Out[28]: 
{'champion': 'brooklyn centenniel, resident of detroit, michigan',
 'underdog': 'kamil kubaru, the challenger from alexandria, virginia'}

会有比这更好的答案,我根本不懂正则表达式,但我很无聊,所以这是我的 2 美分。

这是我将如何去做:

words = my_str.split()
index = words.index('("champion")')
champion = words[index - 6:index]
champion = " ".join(champion)

对于弱者,您必须将 6 更改为 7,并将'("champion")'更改为'("underdog").'

不确定这是否能解决您的问题,但对于这个特定的字符串,当我测试它时,这有效。

如果失败者的尾随句点有问题,您还可以使用str.strip()删除标点符号。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM