I am new to using regular expressions and have been trying to figure out a way of selecting an element of a list which contains two words seperated by whitespace.
I have the following dummy list: ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']
I would like only element 3 matched ('word two <= 0.01')
I have tried using: \\b\\w+(?=\\s)\\b which I cut bits and pieces out of other related questions from stack overflow to find. I know this doesn't work, as there is whitespace after the second word (before <=) however I am just stuck trying to figure out how to fix it.
Here is an example of my code:
example_list = ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']
new_list = []
regex = '\b\w+(?=\s)\b'
for i in example_list:
if re.match(regex, i):
new_list.append(i)
print(new_list)
To match a string starting with 1+ word chars, then 1+ whitespaces and then again a word char, you may use
import re
example_list = ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']
new_list = []
regex = r'\w+\s+\w+\b'
for i in example_list:
if re.match(regex, i):
new_list.append(i)
print(new_list)
# => ['word two <= 0.01']
See the Python demo .
Note re.match
already anchors the match at the start of string, hence no ^
in the above regex. Also, as you used a regular string literal, \\b
in your pattern are backspace chars, not word boundary patterns.
If you need to match a string that has word char + whitespace(s) + word char anywhere in the string, replace re.match
with re.search
and you may even use r'\\w\\s+\\w'
. Or, if you really need to check word boundaries, r'\\b\\w+\\s+\\w+\\b'
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.