简体   繁体   中英

Python capture group and quote it with regex

I'm trying to use a regex to capture data from a file and quote it. I want to capture anything between "Postal Code": and a comma. When I replace that value, it shows like this "whateverdata (with no quote on the end.) Why is that?

Data will look something like this: "State":"NC","Postal Code":27605,"Description":null,

My code:

pattern = r'"Postal Code":(.+),'
replacement = r'"\1"'
jsonObj = re.sub(pattern, replacement, jsonObj)

Since this is json, is there a better way to go about this? Seems like it would be a common problem

You need to either use a non-greedy match here (as @hwnd suggested in comments):

r'"Postal Code":(.+?),'

Or, since you know that this is a postal-code, match one or more digits :

r'"Postal Code":(\d+),'

Demo:

>>> import re
>>> pattern = re.compile(r'"Postal Code":(\d+),')
>>> source = '"State":"NC","Postal Code":27605,"Description":null,'
>>> pattern.search(source).group(1)
'27605'

The problem is the + greedy operator. It will match as much as it can and still allow the remainder of the regular expression to match until it reaches the last comma in the line.

Use +? for a non-greedy match meaning "one or more — preferably as few as possible".

pattern = r'"Postal Code":(.+?),'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM