python regex - find one of several optional groupings of characters

Question

I have the following (discrete) strings:

um
yum
umpire
µm
mi
micro

These strings would be find as-is, not as part of a longer text. (They are possible cell values in a spreadsheet).

I wish to find all strings that are either "um" or "µm" or "mi" or "micro" (but not umpire or yum)

I am struggling with understanding testing for character groupings. Here is what I have:

[(um)(µm)(mi)]

I've also tried variations, such as:

^[(?:um)|(?:µm)|(?:mi)]

But haven't yet found the magic.

RegEx 101 Demo

Desired outcome would be if the following (above, top) strings return True:

"um", "µm", "mi", "micro"

Answer 1

You may use this regex with anchors:

^(?:[uµ]m|mi(?:cro)?)$

Updated RegEx Demo

RegEx Details:

^ : Start
(?: Start non-capture group. This is a non-capture group because it performs the operations (ie returns the boolean true/false), but doesn't select the specified text
- [uµ]m : Match u or µ followed by m , ie matching um and µm
- | : OR
- mi(?:cro)? : Match mi , or with cro on the end if you want.
) : End non-capture group
$ : End

We use the ^ and $ (anchors) to ensure that there is a match if and only if the regex is the entirety of the string: that's why we have the ?: , because its effects are contained within the anchors.

Answer 2

are there any spaces around 'um ', ' um' , or ' um '? You could use that as a boundary.

import re

your_string = 'um yum umpire µm mi micro'

rx = re.compile(r'\s*(um|mi|µm)\s*')

s = rx.search(your_string)

Something like this? Need more specifics on your string

python regex - find one of several optional groupings of characters

Question

2 answers

solution1
3 ACCPTED 2019-08-09 18:37:54

solution2
0 2019-08-09 18:42:12

python regex - find one of several optional groupings of characters

Question

2 answers

solution1 3 ACCPTED 2019-08-09 18:37:54

solution2 0 2019-08-09 18:42:12

solution1
3 ACCPTED 2019-08-09 18:37:54

solution2
0 2019-08-09 18:42:12