Python Regex Search for Word with Non-alpha Characters in the Middle

Question

I need to find indices at which a lowercase word with letters az occurs in a string. However, the string might have a bunch of non-alpha characters within the word.

For example, the word "dont" spans indices [0, 5) in the phrase "don't do that."

I searched around for ways to match non-alpha characters and achieved this with the following regex:

>>> import re
>>> pattern = re.compile("d[^a-z]*o[^a-z]*n[^a-z]*t[^a-z]*")
>>> test = "don't"
>>> pattern.search(test).start()
0
>>> pattern.search(test).end()
5
>>> test = "d'o&&&&&n't"
>>> pattern.search(test).start()
0
>>> pattern.search(test).end()
11
>>>

Is there a more concise way to express this regex? Or would I have to write code to insert [^az]* between every character in every word I want to search for?

Sorry if this question already exists - I don't know exactly how to phrase this question. Thanks for the help.

Answer 1

You can match for every lowercase word like that, using repetition under a non-capturing group:

(?:[a-z][^a-z]*)+

Alternatively, you can automate this regex for every given word:

>>> word = 'dont'
>>> regex = ''.join(x + '[^a-z]*' for x in word)
>>> regex
'd[^a-z]*o[^a-z]*n[^a-z]*t[^a-z]*'

Answer 2

Yes, you will have to do it the way you showed if it is really your intention.

A regex only matches consequent sequences of specific chars or types of chars. It cannot know that you need to match d&&o with d and o only, since there are other chars that must be matched.

Answer 3

尝试这个：

pattern = re.compile("[^\w']|don't")

Python Regex Search for Word with Non-alpha Characters in the Middle

Question

3 answers

solution1
2 ACCPTED 2017-05-31 20:52:56

solution2
1 2017-05-31 21:14:33

solution3
0 2017-05-31 20:54:03

Python Regex Search for Word with Non-alpha Characters in the Middle

Question

3 answers

solution1 2 ACCPTED 2017-05-31 20:52:56

solution2 1 2017-05-31 21:14:33

solution3 0 2017-05-31 20:54:03

solution1
2 ACCPTED 2017-05-31 20:52:56

solution2
1 2017-05-31 21:14:33

solution3
0 2017-05-31 20:54:03