简体   繁体   English

Python将字符串与正则表达式匹配

[英]Python match a string with regex

I need a python regular expression to check if a word is present in a string. 我需要一个python正则表达式来检查字符串中是否存在单词。 The string is separated by commas, potentially. 该字符串可能以逗号分隔。

So for example, 所以,例如,

line = 'This,is,a,sample,string'

I want to search based on "sample", this would return true. 我想基于“样本”进行搜索,这将返回true。 I am crappy with reg ex, so when I looked at the python docs, I saw something like 我很喜欢reg ex,所以当我查看python文档时,我看到了类似的内容

import re
re.match(r'sample', line)

But I don't know why there was an 'r' before the text to be matched. 但我不知道为什么在文本匹配之前会有'r'。 Can someone help me with the regular expression? 有人可以用正则表达式帮助我吗?

Are you sure you need a regex? 你确定你需要正则表达式吗? It seems that you only need to know if a word is present in a string, so you can do: 看来你只需要知道字符串中是否有单词,所以你可以这样做:

>>> line = 'This,is,a,sample,string'
>>> "sample" in line
 True

The r makes the string a raw string , which doesn't process escape characters (however, since there are none in the string, it is actually not needed here). r使字符串成为原始字符串 ,它不处理转义字符(但是,因为字符串中没有字符串,所以实际上不需要它)。

Also, re.match matches from the beginning of the string. 此外, re.match匹配字符串的开头。 In other words, it looks for an exact match between the string and the pattern. 换句话说,它寻找字符串和模式之间的精确匹配。 To match stuff that could be anywhere in the string, use re.search . 要匹配字符串中任何位置的内容,请使用re.search See a demonstration below: 请参阅下面的演示:

>>> import re
>>> line = 'This,is,a,sample,string'
>>> re.match("sample", line)
>>> re.search("sample", line)
<_sre.SRE_Match object at 0x021D32C0>
>>>

r stands for a raw string, so things like \\ will be automatically escaped by Python. r代表一个原始字符串,因此\\像\\将自动转义为\\。

Normally, if you wanted your pattern to include something like a backslash you'd need to escape it with another backslash. 通常情况下,如果你希望你的模式包含类似反斜杠的东西,你需要用另一个反斜杠来逃避它。 raw strings eliminate this problem. 原始字符串消除了这个问题

short explanation 简短的解释

In your case, it does not matter much but it's a good habit to get into early otherwise something like \\b will bite you in the behind if you are not careful (will be interpreted as backspace character instead of word boundary) 在你的情况下,它并不重要,但它是一个很好的习惯进入早期否则像\\ b将会在后面咬你如果你不小心(将被解释为退格字符而不是字边界)

As per re.match vs re.search here's an example that will clarify it for you: 根据re.match vs re.search这里有一个例子,它将为您澄清:

>>> import re
>>> testString = 'hello world'
>>> re.match('hello', testString)
<_sre.SRE_Match object at 0x015920C8>
>>> re.search('hello', testString)
<_sre.SRE_Match object at 0x02405560>
>>> re.match('world', testString)
>>> re.search('world', testString)
<_sre.SRE_Match object at 0x015920C8>

So search will find a match anywhere, match will only start at the beginning 所以搜索会在任何地方找到匹配,匹配只会从头开始

You do not need regular expressions to check if a substring exists in a string. 您不需要正则表达式来检查字符串中是否存在子字符串。

line = 'This,is,a,sample,string'
result = bool('sample' in line) # returns True

If you want to know if a string contains a pattern then you should use re.search 如果您想知道字符串是否包含模式,那么您应该使用re.search

line = 'This,is,a,sample,string'
result = re.search(r'sample', line) # finds 'sample'

This is best used with pattern matching, for example: 这最适用于模式匹配,例如:

line = 'my name is bob'
result = re.search(r'my name is (\S+)', line) # finds 'bob'

As everyone else has mentioned it is better to use the "in" operator, it can also act on lists: 正如其他人提到的那样,最好使用“in”运算符,它也可以对列表起作用:

line = "This,is,a,sample,string"
lst = ['This', 'sample']
for i in lst:
     i in line

>> True
>> True

One Liner implementation: 一个班轮实施:

a=[1,3]
b=[1,2,3,4]
all(i in b for i in a)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM