[英]Regex match until any characters are found except for the word 'and'
I would like to find a regex solution to match a part of the string after Item Number(s) until any characters are found except if it is the word 'and'我想找到一个正则表达式解决方案来匹配项目编号之后的字符串的一部分,直到找到任何字符,除非它是单词“and”
s = 'this part 123 should be ignored Item Number(s)92349252 and 30239429434, 124029354,345340332, 234325923 hallo 2121124'
it works if I add specifically hallo
如果我特别添加
hallo
它会起作用
re.match(r'.*?Item Number\(s\)(.*?)hallo.*$', s).group(1)
'92349252 and 30239429434, 124029354,345340332, 234325923 '
however I want it to work for any characters (including hallo
) except if it is the word and
.但是我希望它适用于任何字符(包括
hallo
),除非它是单词and
。
You dont need regex just use:您不需要正则表达式只需使用:
a,b,c = s.partition("and")
print(c)
c is the part after and. c 是 and 之后的部分。
We can try using a combination of string split with re.findall
.我们可以尝试使用 string split 和
re.findall
的组合。 First, split the input on the text Item Number(s)
, and retain the second entry in the array.首先,拆分文本
Item Number(s)
上的输入,并保留数组中的第二个条目。 This corresponds to all text to the right of Item Number(s)
.这对应于
Item Number(s)
右侧的所有文本。 Then, use re.split
to split on whitespace followed by some content which is not either the word and
, a digit, space, or commad.然后,使用
re.split
在re.split
拆分,后跟一些不是单词and
、数字、空格或逗号的内容。 Finally, use re.findall
to capture all numbers from the remaining text.最后,使用
re.findall
从剩余文本中捕获所有数字。
s = 'this part 123 should be ignored Item Number(s)92349252 and 30239429434, 124029354,345340332, 234325923 hallo 2121124'
nums = re.findall(r'\b\d+\b', re.split(r' (?!\band\b|[\d\s,])', s.split('Item Number(s)')[1])[0])
print(nums)
['92349252', '30239429434', '124029354', '345340332', '234325923']
I stated the question incorrectly.我错误地陈述了这个问题。 The correct question is: find a string containing numbers after Item Number(s) until a word is found except if this word is
and
.正确的问题是:找到一个包含 Item Number(s) 之后的数字的字符串,直到找到一个单词,除非这个单词是
and
。
Rephrasing: find the string after Item Number(s)
which have 1 or more digits separated by either zero or more non word character(s) or repeated the word 'and` preceded with a non word character followed by 0 or more non word character(s)改写:查找
Item Number(s)
之后的字符串,它有 1 个或多个数字,由零个或多个非单词字符分隔,或者重复单词 'and' 前面有一个非单词字符,后跟 0 个或多个非单词字符(s)
import re
s = '123 ignore Item Number(s)92349252 and,,;^and,and;;;30239429434, 124029354,345340332, and and 234325923 hallo 2121124'
pattern = r'.*?Item Number\(s\)(((\W*?|(\W+?and)+\W*?)\d+)+)'
m = re.match(pattern, s).group(1)
numbers = re.findall('\d+', m)
print(numbers)
is是
['92349252', '30239429434', '124029354', '345340332', '234325923']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.