I would like to match all the words in a string containing
For example (and that is best result I got):
test_string = "test_string TEST_STRING TEST_string _TEST_STRING_ TESTSTRING ANOTHER_TEST_STRING"
p = re.compile("(\S*[A-Z_]\S*[_]\S*)")
p.search(test_string)
The words I would like to obtain from the search method are:
But I am obtaining
Thank you
You regex (\\S*[A-Z_]\\S*[_]\\S*)
uses \\S*
which will match a non-whitespace character and repeats that 0+ times so you would for example also match __
or A_
You might use:
\b[A-Z]+_[A-Z_]*[A-Z]\b
Explanation
\\b
Word boundary [AZ]+
Match 1+ uppercase chars _
Match underscore [A-Z_]*
Match 0+ times either an uppercase char or an underscore [AZ]
Match an uppercase char \\b
Word boundary re.search will return the first location where the regex matches. You could use findall instead:
import re
test_string = "test_string TEST_STRING TEST_string _TEST_STRING_ TESTSTRING ANOTHER_TEST_STRING"
p = re.compile(r"\b[A-Z]+_[A-Z_]*[A-Z]\b")
print(re.findall(p,test_string))
Result
['TEST_STRING', 'ANOTHER_TEST_STRING']
See the regex demo | Python demo
This should work:
import re
regex = r"\b([A-Z]+(?:_[A-Z]+){1,})\b"
test_str = "test_string TEST_STRING TEST_string _TEST_STRING_ TESTSTRING ANOTHER_TEST_STRING"
matches = re.findall(regex, test_str, re.MULTILINE)
Output:
>>> matches
['TEST_STRING', 'ANOTHER_TEST_STRING']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.