I am trying to use python re to match a string with a specific pattern. The problem I met is, I have this expected sentence:
"It is X. not X`
X can be anything; A word, or a bunch of word, or number, or digits.
The pattern I build is:
It is \w+. not \w+
just using
string.replace("X", "\w+")
It works if X
is a word, or bunch of words, or int, but not for digits. How can I build my pattern in order to match everything in this pattern?
The .
is a special character in a regular expression that will match any character. So .+
will match one or more characters.
r"It is .+\. not .+"
Not that the period is escaped \\.
, this is because in that case, you want to match an actual period.
Because .+
won't work in some cases, for example
It is quote. not a double-quote
It is a dog. not a cat
I would use this one instead :
(?<=It is ).+(?=\\.)|(?<=not ).+$
Explanation
(?<=It is ).+(?=\\.)
Any consecutive characters precedeed by It is
and followed by a point
|
OR
(?<=not ).*$
Any consecutive characters precedeed by not
and followed by end of line anchor
(?<=It is ).*(?=\\.)|(?<=not ).*$
I have figured out, can use str.replace("X", "(\\w+|\\d+\\.\\d+)")
to approach the problem. Hope can help others having the same issue.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.