简体   繁体   中英

How to find words in a string containing at least one underscore and capital letters

I would like to match all the words in a string containing

  1. at least one underscore (but the word cannot neither start nor end with it)
  2. at least two uppercase letters
  3. all the letters must be uppercase.

For example (and that is best result I got):

test_string = "test_string TEST_STRING TEST_string _TEST_STRING_ TESTSTRING ANOTHER_TEST_STRING"
p = re.compile("(\S*[A-Z_]\S*[_]\S*)") 
p.search(test_string)

The words I would like to obtain from the search method are:

  1. TEST_STRING (the second word, not the substring of _TEST_STRING_)
  2. ANOTHER_TEST_STRING

But I am obtaining

  1. TEST_STRING
  2. TEST_STRING (which is the substring of _TEST_STRING_).

Thank you

You regex (\\S*[A-Z_]\\S*[_]\\S*) uses \\S* which will match a non-whitespace character and repeats that 0+ times so you would for example also match __ or A_

You might use:

\b[A-Z]+_[A-Z_]*[A-Z]\b

Explanation

  • \\b Word boundary
  • [AZ]+ Match 1+ uppercase chars
  • _ Match underscore
  • [A-Z_]* Match 0+ times either an uppercase char or an underscore
  • [AZ] Match an uppercase char
  • \\b Word boundary

re.search will return the first location where the regex matches. You could use findall instead:

import re
test_string = "test_string TEST_STRING TEST_string _TEST_STRING_ TESTSTRING ANOTHER_TEST_STRING"
p = re.compile(r"\b[A-Z]+_[A-Z_]*[A-Z]\b") 
print(re.findall(p,test_string))

Result

['TEST_STRING', 'ANOTHER_TEST_STRING']

See the regex demo | Python demo

This should work:

import re

regex = r"\b([A-Z]+(?:_[A-Z]+){1,})\b"
test_str = "test_string TEST_STRING TEST_string _TEST_STRING_ TESTSTRING ANOTHER_TEST_STRING"
matches = re.findall(regex, test_str, re.MULTILINE)

Output:

>>> matches
['TEST_STRING', 'ANOTHER_TEST_STRING']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM