简体   繁体   中英

Python using re to match string in a specific pattern

I am trying to use python re to match a string with a specific pattern. The problem I met is, I have this expected sentence:

"It is X. not X`

X can be anything; A word, or a bunch of word, or number, or digits.

The pattern I build is:

It is \w+. not \w+

just using

string.replace("X", "\w+")

It works if X is a word, or bunch of words, or int, but not for digits. How can I build my pattern in order to match everything in this pattern?

The . is a special character in a regular expression that will match any character. So .+ will match one or more characters.

r"It is .+\. not .+"

Not that the period is escaped \\. , this is because in that case, you want to match an actual period.

Because .+ won't work in some cases, for example

It is quote. not a double-quote

It is a dog. not a cat

I would use this one instead :

(?<=It is ).+(?=\\.)|(?<=not ).+$

Explanation

(?<=It is ).+(?=\\.) Any consecutive characters precedeed by It is and followed by a point

| OR

(?<=not ).*$ Any consecutive characters precedeed by not and followed by end of line anchor

(?<=It is ).*(?=\\.)|(?<=not ).*$

Demo

I have figured out, can use str.replace("X", "(\\w+|\\d+\\.\\d+)") to approach the problem. Hope can help others having the same issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM