I have the following (discrete) strings:
um
yum
umpire
µm
mi
micro
These strings would be find as-is, not as part of a longer text. (They are possible cell values in a spreadsheet).
I wish to find all strings that are either "um" or "µm" or "mi" or "micro" (but not umpire or yum)
I am struggling with understanding testing for character groupings. Here is what I have:
[(um)(µm)(mi)]
I've also tried variations, such as:
^[(?:um)|(?:µm)|(?:mi)]
But haven't yet found the magic.
Desired outcome would be if the following (above, top) strings return True:
"um", "µm", "mi", "micro"
You may use this regex with anchors:
^(?:[uµ]m|mi(?:cro)?)$
RegEx Details:
^
: Start (?:
Start non-capture group. This is a non-capture group because it performs the operations (ie returns the boolean true/false), but doesn't select the specified text
[uµ]m
: Match u
or µ
followed by m
, ie matching um
and µm
|
: OR mi(?:cro)?
: Match mi
, or with cro
on the end if you want. )
: End non-capture group $
: End We use the ^
and $
(anchors) to ensure that there is a match if and only if the regex is the entirety of the string: that's why we have the ?:
, because its effects are contained within the anchors.
are there any spaces around 'um ', ' um' , or ' um '? You could use that as a boundary.
import re
your_string = 'um yum umpire µm mi micro'
rx = re.compile(r'\s*(um|mi|µm)\s*')
s = rx.search(your_string)
Something like this? Need more specifics on your string
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.