Regular expression to find all text inside of { } including spaces and newlines

Question

I spent many hours and exhausted for this. I know regex is a very strong tool but it is too difficult to me. Please help me. I want to extract a json string from html pages. This is a example of the nested json.

<script>

            window.__INITIAL_STATE__ = {
       "properties":"ASSET_HOST", "https"
:"//asom","recaptcha":"ABCD", "aaa": {"b":"C", "D":"E"}
            };

        </script >

And I wrote a regex expression like this to extract all text rounded by curly braces {}.

parttern = '(\{.*\s*\});\s*<'

But it returns only parts of string.

{"b":"C", "D":"E"}
            }

Could you advice me how I should write a regex expression to extract all string rounded by {} please?

Answer 1

Not sure if this is what you want but in order to have the outer curly braces as well, you'll need a recursive approach which only works with the newer regex module. Consider

import regex as re

rx = re.compile(r'\{(?:[^{}]*|(?R))*\}')


junk = """
<script>

            window.__INITIAL_STATE__ = {
       "properties":"ASSET_HOST", "https"
:"//asom","recaptcha":"ABCD", "aaa": {"b":"C", "D":"E"}
            };

        </script >
"""

for match in rx.finditer(junk):
    print(match.group(0))

Which yields

{
       "properties":"ASSET_HOST", "https"
:"//asom","recaptcha":"ABCD", "aaa": {"b":"C", "D":"E"}
            }

See a demo for the expression on regex101.com .

Obligatory warning: "parsing" stuff like this with regular expressions is usually not the way to go.

Regular expression to find all text inside of { } including spaces and newlines

Question

1 answers

solution1
4 ACCPTED 2019-11-21 21:34:46

Regular expression to find all text inside of { } including spaces and newlines

Question

1 answers

solution1 4 ACCPTED 2019-11-21 21:34:46

solution1
4 ACCPTED 2019-11-21 21:34:46