I spent many hours and exhausted for this. I know regex is a very strong tool but it is too difficult to me. Please help me. I want to extract a json string from html pages. This is a example of the nested json.
<script>
window.__INITIAL_STATE__ = {
"properties":"ASSET_HOST", "https"
:"//asom","recaptcha":"ABCD", "aaa": {"b":"C", "D":"E"}
};
</script >
And I wrote a regex expression like this to extract all text rounded by curly braces {}.
parttern = '(\{.*\s*\});\s*<'
But it returns only parts of string.
{"b":"C", "D":"E"}
}
Could you advice me how I should write a regex expression to extract all string rounded by {} please?
Not sure if this is what you want but in order to have the outer curly braces as well, you'll need a recursive approach which only works with the newer regex
module. Consider
import regex as re
rx = re.compile(r'\{(?:[^{}]*|(?R))*\}')
junk = """
<script>
window.__INITIAL_STATE__ = {
"properties":"ASSET_HOST", "https"
:"//asom","recaptcha":"ABCD", "aaa": {"b":"C", "D":"E"}
};
</script >
"""
for match in rx.finditer(junk):
print(match.group(0))
Which yields
{
"properties":"ASSET_HOST", "https"
:"//asom","recaptcha":"ABCD", "aaa": {"b":"C", "D":"E"}
}
See a demo for the expression on regex101.com .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.