简体   繁体   中英

python regex - find one of several optional groupings of characters

I have the following (discrete) strings:

um
yum
umpire
µm
mi
micro

These strings would be find as-is, not as part of a longer text. (They are possible cell values in a spreadsheet).

I wish to find all strings that are either "um" or "µm" or "mi" or "micro" (but not umpire or yum)

I am struggling with understanding testing for character groupings. Here is what I have:

[(um)(µm)(mi)]

I've also tried variations, such as:

^[(?:um)|(?:µm)|(?:mi)]

But haven't yet found the magic.

RegEx 101 Demo

Desired outcome would be if the following (above, top) strings return True:

"um", "µm", "mi", "micro"

You may use this regex with anchors:

^(?:[uµ]m|mi(?:cro)?)$

Updated RegEx Demo

RegEx Details:

  • ^ : Start
  • (?: Start non-capture group. This is a non-capture group because it performs the operations (ie returns the boolean true/false), but doesn't select the specified text
    • [uµ]m : Match u or µ followed by m , ie matching um and µm
    • | : OR
    • mi(?:cro)? : Match mi , or with cro on the end if you want.
  • ) : End non-capture group
  • $ : End

We use the ^ and $ (anchors) to ensure that there is a match if and only if the regex is the entirety of the string: that's why we have the ?: , because its effects are contained within the anchors.

are there any spaces around 'um ', ' um' , or ' um '? You could use that as a boundary.

import re

your_string = 'um yum umpire µm mi micro'

rx = re.compile(r'\s*(um|mi|µm)\s*')

s = rx.search(your_string)

Something like this? Need more specifics on your string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM