简体   繁体   English

Python连续多次匹配正则表达式(不是findall方式)

[英]Python matching regex multiple times in a row (not the findall way)

This question is not asking about finding 'a' multiple times in a string etc. 这个问题并不是要在字符串中多次查找“ a”。

What I would like to do is match: 我想做的是比赛:

[ a-zA-Z0-9]{1,3}\.

regexp multiple times, one way of doing this is using | regexp多次,一种方法是使用|

'[ a-zA-Z0-9]{1,3}\.[ a-zA-Z0-9]{1,3}\.[ a-zA-Z0-9]{1,3}\.[ a-zA-Z0-9]{1,3}\.|[ a-zA-Z0-9]{1,3}\.[ a-zA-Z0-9]{1,3}\.[ a-zA-Z0-9]{1,3}\.|[ a-zA-Z0-9]{1,3}\.[ a-zA-Z0-9]{1,3}\.'

so this matches the regexp 4 or 3 or 2 times. 所以这匹配正则表达式4或3或2次。 Matches stuff like: 匹配以下内容:

a. v. b.
m a.b.

Is there any way to make this more coding like? 有什么办法可以使这种编码更像吗?

I tried doing 我试着做

([ a-zA-Z0-9]{1,3}\.){2,4} 

but the functionality is not the same what I expected. 但是功能与我预期的不同。 THis one matches: 这是一场比赛:

regex.findall(string)
[u' b.', u'b.']

string is: 字符串是:

a. v. b. split them a.b. split somethinf words. THen we say some more words, like ten

Is there any way to do this? 有什么办法吗? THe goal is to match possible english abbreviations and names like Mary JE things that the sentence tokenizer recognizes as sentence punctuation but are not. 目标是匹配可能的英语缩写和名称,例如Mary JE,句子标记化工具识别为句子标点但不能识别。

I want to match all of this: 我要匹配所有这些:

U.S. , c.v.a.b. , a. v. p. 

first of all Your regex will work as you expect : 首先,您的正则表达式将按您期望的那样工作:

>>> s="aa2.jhf.jev.d23.llo."
>>> import re
>>> re.search(r'([ a-zA-Z0-9]{1,3}\.){2,4}',s).group(0)
'aa2.jhf.jev.d23.'

But if you want to match some sub strings like US , cvab , avp you need to put the whole of regex in a capture group : 但是,如果要匹配US , cvab , avp类的子字符串US , cvab , avp需要将整个regex放在捕获组中:

>>> s= 'a. v. b. split them a.b. split somethinf words. THen we say' some more 
>>> re.findall(r'(([ a-zA-Z0-9]{1,3}\.){2,4})',s)
[('a. v. b.', ' b.'), ('m a.b.', 'b.')]

then use a list comprehension to get the first matches : 然后使用列表推导获得第一个匹配项:

>>> [i[0] for i in re.findall(r'(([ a-zA-Z0-9]{1,3}\.){2,4})',s)]
['a. v. b.', 'm a.b.']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM