简体   繁体   中英

how do i extract value inside quotes using regex python?

My text is

my_text = '"posted_data":"2e54eba66f8f2881c8e78be8342428xd","isropa":false,"rx":"NO","readal":"false"'

I am trying to extract value of posted_data which is 2e54eba66f8f2881c8e78be8342428xd

My code :

extract_posted_data = re.search(r'(\"posted_data\": \")(\w*)', my_text)
print (extract_posted_data)

and it prints None

Thank you

This particular example doesn't seem like it needs regular expressions at all.

>>> my_text
'"posted_data":"2e54eba66f8f2881c8e78be8342428xd","isropa":false,"rx":"NO","readal":"false"'
>>> import json
>>> result = json.loads('{%s}' % my_text)
>>> result
{'posted_data': '2e54eba66f8f2881c8e78be8342428xd', 'isropa': False, 'rx': 'NO', 'readal': 'false'}
>>> result['posted_data']
'2e54eba66f8f2881c8e78be8342428xd'

With BeautifulSoup :

>>> import json
... 
... from bs4 import BeautifulSoup
... 
... soup = BeautifulSoup('<script type="text/javascript"> "posted_data":"2738273283723hjasda" </script>')
... 
... result = json.loads('{%s}' % soup.script.text)
>>> result
{'posted_data': '2738273283723hjasda'}
>>> result['posted_data']
'2738273283723hjasda'

You need to change your regex to use lookarounds, as follows:

my_text = '"posted_data":"2e54eba66f8f2881c8e78be8342428xd","isropa":false,"rx":"NO","readal":"false"'
extract_posted_data = re.search(r'(?<="posted_data":")\w*(?=")', my_text)
print (extract_posted_data[0])

Prints 2e54eba66f8f2881c8e78be8342428xd

Also re.search() returns a Match object, so to get the first match (the only match) you get index 0 of the match:

This is because your original code has an additional space. It should be:

extract_posted_data = re.search(r'(\"posted_data\":\")(\w*)', my_text)

And in fact, '\\' is unnecessary here. Just:

extract_posted_data = re.search(r'("posted_data":")(\w*)', my_text)

Then:

extract_posted_data.group(2)

is what you want.

>>> my_text = '"posted_data":"2e54eba66f8f2881c8e78be8342428xd","isropa":false,"rx":"NO","readal":"false"'
>>> extract_posted_data = re.search(r'("posted_data":")(\w*)', my_text)   
>>> extract_posted_data.group(2)
'2e54eba66f8f2881c8e78be8342428xd'

as others have mentioned json would be a better tool for this data but you can also use this regex (I added a \\s* in case in the future there are spaces in between):

regex: "posted_data":\\s*"(?P<posted_data>[^"]+)"

import re

my_text = '"posted_data":"2e54eba66f8f2881c8e78be8342428xd","isropa":false,"rx":"NO","readal":"false"'
m = re.search(r'"posted_data":\s*"(?P<posted_data>[^"]+)"', my_text)
if m:
    print(m.group('posted_data'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM