![](/img/trans.png)
[英]Python Regex: Match a string not preceded by or followed by a word with digits in it
[英]RegEx for matching a word followed by slash and 10 digits
我有一个字符串,我试图搜索所有以mystring/
开头并以10位ID号结尾的字符串。 我想输出带有附件字符串的所有这些ID的列表。
我不太了解regex,但是我猜想这是这里要使用的库。 我从下面开始:
import re
source = 'mystring/1234567890 hello world mystring/2345678901 hello'
re.findall("mystring/",source)
在这里,我们将使用两个捕获组,分别提取带ID和不带ID的mystring
:
(mystring\/([0-9]{10}))
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"(mystring\/([0-9]{10}))"
test_str = "hello mystring/1234567890 hello world mystring/2345678901 hellomystring/1234567890 hello world mystring/2345678901 hello"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
如果不需要此表达式,则可以在regex101.com中对其进行修改/更改。
jex.im可视化正则表达式:
const regex = /(mystring\\/([0-9]{10}))/gm; const str = `hello mystring/1234567890 hello world mystring/2345678901 hellomystring/1234567890 hello world mystring/2345678901 hello`; let m; while ((m = regex.exec(str)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; } // The result can be accessed through the `m`-variable. m.forEach((match, groupIndex) => { console.log(`Found match, group ${groupIndex}: ${match}`); }); }
您可以使用单词边界\\b
来防止mystring成为较大单词的一部分,然后使用量词匹配正斜杠后跟10个数字\\d{10}
:
\bmystring/\d{10}
例如:
import re
source = 'mystring/1234567890 hello world mystring/2345678901 hello'
print(re.findall(r"\bmystring/\d{10}",source))
结果:
['mystring/1234567890', 'mystring/2345678901']
如果您只想列出数字,则可以使用正向查找:
(?<=\bmystring/)\d{10}(?!\S)
(?<=\\bmystring/)
,断言直接在左边的是mystring \\d{10}
匹配10位数字 (?!\\S)
负向超前,断言右边直接不是非空白字符 您可以使用
r"\bmystring/(\d{10})(?!\d)"
参见regex演示 。
细节
\\bmystring/
-一个单词边界,仅将mystring
作为一个完整的单词与/
末尾匹配 (\\d{10})
-捕获#1组:10位数字 (?!\\d)
-不跟其他数字。 Python演示 :
import re
source = 'mystring/1234567890 hello world mystring/2345678901 hello'
matches = re.finditer(r"\bmystring/(\d{10})(?!\d)", source)
for match in matches:
print("Whole match: {}".format(match.group(0)))
print("Group 1: {}".format(match.group(1)))
输出:
Whole match: mystring/1234567890
Group 1: 1234567890
Whole match: mystring/2345678901
Group 1: 2345678901
或者,仅使用
print(re.findall(r"\bmystring/(\d{10})(?!\d)", source))
会输出ID列表: ['1234567890', '2345678901']
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.