简体   繁体   中英

Python regex to find sequences matching: word + whitespace + word

I am new to using regular expressions and have been trying to figure out a way of selecting an element of a list which contains two words seperated by whitespace.

I have the following dummy list: ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']

I would like only element 3 matched ('word two <= 0.01')

I have tried using: \\b\\w+(?=\\s)\\b which I cut bits and pieces out of other related questions from stack overflow to find. I know this doesn't work, as there is whitespace after the second word (before <=) however I am just stuck trying to figure out how to fix it.

Here is an example of my code:

example_list = ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']

new_list = []

regex = '\b\w+(?=\s)\b'

for i in example_list:
    if re.match(regex, i):
        new_list.append(i)

print(new_list)

To match a string starting with 1+ word chars, then 1+ whitespaces and then again a word char, you may use

import re
example_list = ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']
new_list = []
regex = r'\w+\s+\w+\b'
for i in example_list:
    if re.match(regex, i):
        new_list.append(i)
print(new_list)
# => ['word two <= 0.01']

See the Python demo .

Note re.match already anchors the match at the start of string, hence no ^ in the above regex. Also, as you used a regular string literal, \\b in your pattern are backspace chars, not word boundary patterns.

If you need to match a string that has word char + whitespace(s) + word char anywhere in the string, replace re.match with re.search and you may even use r'\\w\\s+\\w' . Or, if you really need to check word boundaries, r'\\b\\w+\\s+\\w+\\b' .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM