简体   繁体   中英

How to find all words followed by symbol using Python Regex?

I need re.findall to detect words that are followed by a "="

So it works for an example like

re.findall('\w+(?=[=])', "I think Python=amazing")

but it won't work for "I think Python = amazing" or "Python =amazing"... I do not know how to possibly integrate the whitespace issue here properly.

Thanks a bunch!

You said "Again stuck in the regex" probably in reference to your earlier question Looking for a way to identify and replace Python variables in a script where you got answers to the question that you asked, but I don't think you asked the question you really wanted the answer to.

You are looking to refactor Python code, and unless your tool understands Python, it will generate false positives and false negatives; that is, finding instances of variable = that aren't assignments and missing assignments that aren't matched by your regexp.

There is a partial list of tools at What refactoring tools do you use for Python? and more general searches with "refactoring Python your_editing_environment" will yield more still.

'(\w+)\s*=\s*'
re.findall('(\w+)\s*=\s*', 'I think Python=amazing')   \\ return 'Python'
re.findall('(\w+)\s*=\s*', 'I think Python = amazing') \\ return 'Python'
re.findall('(\w+)\s*=\s*', 'I think Python =amazing')  \\ return 'Python'

只需在=之前添加一些可选的空格:

\w+(?=\s*=)

Use this instead

 re.findall('^(.+)(?=[=])', "I think Python=amazing")

Explanation

# ^(.+)(?=[=])
# 
# Options: case insensitive
# 
# Assert position at the beginning of the string «^»
# Match the regular expression below and capture its match into backreference number 1 «(.+)»
#    Match any single character that is not a line break character «.+»
#       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=[=])»
#    Match the character “=” «[=]»

You need to allow for whitespace between the word and the = :

re.findall('\w+(?=\s*[=])', "I think Python = amazing")

You can also simplify the expression by using a capturing group around the word, instead of a non-capturing group around the equals:

re.findall('(\w+)\s*=', "I think Python = amazing")

r'(.*)=.*' would do it as well ...

You have anything #1 followed with a = followed with anything #2, you get anything #1.

>>> re.findall(r'(.*)=.*', "I think Python=amazing")
['I think Python']
>>> re.findall(r'(.*)=.*', "  I think Python =    amazing oh yes very amazing   ")
['  I think Python ']
>>> re.findall(r'(.*)=.*', "=  crazy  ")
['']

Then you can strip() the string that is in the list returned.

re.split(r'\s*=', "I think Python=amazing")[0].split() # returns ['I', 'think', 'Python']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM