简体   繁体   中英

Regex, capture using word boundaries without stopping at "dot" and/or other characters

Given for example a string like this:
random word, random characters##?, some dots. username bob.1234 other stuff

I'm currently using this regex to capture the username (bob.1234):

\busername (.+?)(,| |$)

But my code needs a regex with only one capture group as python's re.findall returns something different when there are multiple capture groups. Something like this would almost work, except it will capture the username "bob" instead of "bob.1234":

\busername (.+?)\b

Anybody knows if there is a way to use the word boundary while ignoring the dot and without using more than one capture group?

NOTES:

  • Sometimes there is a comma after the username
  • Sometimes there is a space after the username
  • Sometimes the string ends with the username

The \\busername (.+?)(,| |$) pattern contains 2 capturing groups, and re.findall will return a list of tuples once a match is found. See findall reference :

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

So, there are three approaches here:

  1. Use a (?:...) non-capturing group rather than the capturing one: re.findall(r'\\busername (.+?)(?:,| |$)', s) . It will consume a , or space, but since only captured part will be returned and no overlapping matches are expected, it is OK.
  2. Use a positive lookahead instead: re.findall(r'\\busername (.+?)(?=,| |$)', s) . The space and comma will not be consumed, that is the only difference from the first approach.
  3. You may turn the (.+?)(,| |$) into a simple negated character class [^ ,]+ that matches one or more chars other than a space or comma. It will match till end of string if there are no , or space after username .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM