简体   繁体   中英

RegEx for matching a word followed by slash and 10 digits

I have a string which I am trying to search through for all strings that begin with mystring/ and end with a 10 digit id number. I'd like to output a list of all these ids with the attached string.

I don't really know regex, but I'm guessing that is the library to use here. I've started it off below:

import re
source = 'mystring/1234567890 hello world mystring/2345678901 hello'
re.findall("mystring/",source)

Here, we would be using two capturing groups, and extract both mystring s, with and without IDs:

(mystring\/([0-9]{10}))

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(mystring\/([0-9]{10}))"

test_str = "hello mystring/1234567890 hello world mystring/2345678901 hellomystring/1234567890 hello world mystring/2345678901 hello"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

在此处输入图片说明

RegEx

If this expression wasn't desired, it can be modified/changed in regex101.com .

RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图片说明

Demo

 const regex = /(mystring\\/([0-9]{10}))/gm; const str = `hello mystring/1234567890 hello world mystring/2345678901 hellomystring/1234567890 hello world mystring/2345678901 hello`; let m; while ((m = regex.exec(str)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; } // The result can be accessed through the `m`-variable. m.forEach((match, groupIndex) => { console.log(`Found match, group ${groupIndex}: ${match}`); }); } 

You could use a word boundary \\b to prevent mystring being part of a larger word and then match a forward slash followed by 10 digits \\d{10} using a quantifier :

\bmystring/\d{10}

Regex demo | Python demo

For example:

import re
source = 'mystring/1234567890 hello world mystring/2345678901 hello'
print(re.findall(r"\bmystring/\d{10}",source))

Result:

['mystring/1234567890', 'mystring/2345678901']

If you want to list the digits only, as an alternative you might use a postive lookbehind:

(?<=\bmystring/)\d{10}(?!\S)
  • (?<=\\bmystring/) Positive lookbehind, assert what is directly on the left is mystring
  • \\d{10} match 10 digits
  • (?!\\S) Negative lookahead, assert what is directly on the right is not a non whitespace character

Regex demo | Python demo

You may use

r"\bmystring/(\d{10})(?!\d)"

See the regex demo .

Details

  • \\bmystring/ - a word boundary that only matches mystring as a whole word with / at the end
  • (\\d{10}) - capturing group #1: 10 digits
  • (?!\\d) - not followed with another digit.

Python demo :

import re
source = 'mystring/1234567890 hello world mystring/2345678901 hello'
matches = re.finditer(r"\bmystring/(\d{10})(?!\d)", source)
for match in matches:
    print("Whole match: {}".format(match.group(0)))
    print("Group 1: {}".format(match.group(1)))

Output:

Whole match: mystring/1234567890
Group 1: 1234567890
Whole match: mystring/2345678901
Group 1: 2345678901

Or, use just

print(re.findall(r"\bmystring/(\d{10})(?!\d)", source))

that will output a list of the IDs: ['1234567890', '2345678901'] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM