简体   繁体   English

Python re.search

[英]Python re.search

I have a string variable containing 我有一个字符串变量包含

string = "123hello456world789"

string contain no spacess. 字符串不包含空格。 I want to write a regex such that prints only words containing(az) I tried a simple regex 我想写一个正则表达式,只打印包含(az)的单词我尝试了一个简单的正则表达式

pat = "([a-z]+){1,}"
match = re.search(r""+pat,word,re.DEBUG)

match object contains only the word Hello and the word World is not matched. match对象只包含单词Hello ,而单词World不匹配。

When is used re.findall() I could get both Hello and World . 什么时候使用re.findall()我可以得到HelloWorld

My question is why we can't do this with re.search() ? 我的问题是为什么我们不能用re.search()做到这一点?

How do this with re.search() ? 如何使用re.search()

re.search() finds the pattern once in the string, documenation : re.search()在字符串中找到一次模式, 文档

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. 扫描字符串,查找正则表达式模式生成匹配项的位置,并返回相应的MatchObject实例。 Return None if no position in the string matches the pattern; 如果字符串中没有位置与模式匹配,则返回None; note that this is different from finding a zero-length match at some point in the string. 请注意,这与在字符串中的某个点找到零长度匹配不同。

In order to match every occurrence, you need re.findall() , documentation : 为了匹配每次出现,你需要re.findall()文档

Return all non-overlapping matches of pattern in string, as a list of strings. 返回字符串中pattern的所有非重叠匹配,作为字符串列表。 The string is scanned left-to-right, and matches are returned in the order found. 从左到右扫描字符串,并按找到的顺序返回匹配项。 If one or more groups are present in the pattern, return a list of groups; 如果模式中存在一个或多个组,则返回组列表; this will be a list of tuples if the pattern has more than one group. 如果模式有多个组,这将是一个元组列表。 Empty matches are included in the result unless they touch the beginning of another match. 结果中包含空匹配,除非它们触及另一个匹配的开头。

Example: 例:

>>> import re
>>> regex = re.compile(r'([a-z]+)', re.I)
>>> # using search we only get the first item.
>>> regex.search("123hello456world789").groups()
('hello',)
>>> # using findall we get every item.
>>> regex.findall("123hello456world789")
['hello', 'world']

UPDATE: 更新:

Due to your duplicate question ( as discussed at this link ) I have added my other answer here as well: 由于您的重复问题如此链接所述 ),我在此处添加了其他答案:

>>> import re
>>> regex = re.compile(r'([a-z][a-z-\']+[a-z])')
>>> regex.findall("HELLO W-O-R-L-D") # this has uppercase
[]  # there are no results here, because the string is uppercase
>>> regex.findall("HELLO W-O-R-L-D".lower()) # lets lowercase
['hello', 'w-o-r-l-d'] # now we have results
>>> regex.findall("123hello456world789")
['hello', 'world']

As you can see, the reason why you were failing on the first sample you provided is because of the uppercase, you can simply add the re.IGNORECASE flag, though you mentioned that matches should be lowercase only. 正如您所看到的,您提供的第一个示例失败的原因是因为大写,您只需添加re.IGNORECASE标志,但您提到匹配应仅为小写。

@InbarRose answer shows why re.search works that way, but if you want match objects rather than just the string outputs from re.findall , use re.finditer @InbarRose答案显示为什么re.search以这种方式工作,但是如果你想要match对象而不仅仅是来自re.findall的字符串输出,请使用re.finditer

>>> for match in re.finditer(pat, string):
...     print match.groups()
...
('hello',)
('world',)
>>>

Or alternatively if you wanted a list 或者,如果你想要一个list

>>> list(re.finditer(pat, string))
[<_sre.SRE_Match object at 0x022DB320>, <_sre.SRE_Match object at 0x022DB660>]

It's also generally a bad idea to use string as a variable name given that it's a common module. string用作变量名称通常也是一个坏主意,因为它是一个通用模块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM