简体   繁体   English

Python正则表达式:在句子中重复匹配模式

[英]Python regex: Matching pattern repeatedly within a sentence

I have an expression of the form我有一个形式的表达

some_text_0 pattern_instance_1 some_text_1 pattern_instance_2 some_text_2 pattern_instance_3 some_text_3 ..

where each pattern_instance is an instance of PATTERN ,其中每个pattern_instancePATTERN一个实例,

and I would like to extract it as [pattern_instance_1, some_text_1], [pattern_instance_2, some_text_2], ... (dropping the first some_text_0 ).我想将其提取为[pattern_instance_1, some_text_1], [pattern_instance_2, some_text_2], ... (删除第一个some_text_0 )。

What is the best way to do this ?做这个的最好方式是什么 ?

As a more concrete case I am trying to match something like作为一个更具体的案例,我试图匹配类似的东西

Things I need to buy: 1 banana two apples three pears zero kiwis

into进入

[1, banana] , [two, apples] , .. [1, banana][two, apples] ,..

I already have the regex to match the numbers but it's fairly complex.我已经有了匹配数字的正则表达式,但它相当复杂。 The few solutions I found seem to involve negating this regex to do the match on some text but I was wondering whether there would be another way, as I am not sure how to negate my regex.我发现的少数解决方案似乎涉及否定此正则表达式以对某些文本进行匹配,但我想知道是否还有另一种方法,因为我不确定如何否定我的正则表达式。 I also tried playing with re.find_all() but couldn't get it to work.我也尝试玩re.find_all()但无法让它工作。

This is how I'd approach it...这就是我将如何处理它...

  1. re.finditer will give you a list of MatchObjects re.finditer会给你一个 MatchObjects 列表

  2. Each MatchObject has start function that gives you the first index of the pattern.每个 MatchObject 都有start函数,它为您提供模式的第一个索引。 end() function is analogical. end() 函数是类比的。

  3. Then, the only thing left is to build the tuple.然后,剩下的唯一事情就是构建元组。

    • Create first element by retrieving the text between start() and end() indices.通过检索 start() 和 end() 索引之间的文本来创建第一个元素。
    • Create the second element by retrieving the text between end() of this MatchObject and start() of the next MatchObject (or until the end of the string if that was the last MatchObject).通过检索此 MatchObject 的 end() 和下一个 MatchObject 的 start() 之间的文本(或直到字符串的末尾,如果这是最后一个 MatchObject)来创建第二个元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM