简体   繁体   中英

Using regex to parse string python3

I am trying to access gSecureToken from the following string:

$("#ejectButton").on("click", function(e) {
            $("#ejectButton").prop("disabled", true);
            $.ajax({
                url : "/apps_home/eject/",
                type : "POST",
                data : { gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" },
                dataType : "json",
                success : function(data, textStatus, xhr) {
                    $("#smbStatus").html('');
                    $("#smbEnable").removeClass('greenColor').html('OFF');
                    showPopup("MiFi Share", "<p>Eject completed. It is now safe to remove your USB storage device.</p>");
                },
                error : function(xhr, textStatus, errorThrown) {
                    //undoChange($toggleSwitchElement);
                    // If auth session has ended, force a new login with a fresh GET.
                    if( (xhr.status == 401) || (xhr.status == 403) || (xhr.status == 406) ) window.location.replace(window.location.href);
                }
            });

How can I use regex to parse the value out of the string? I know once I have it parsed I will be able to load it as JSON.

My current code doesn't use an regex, it just deals with using BeautifulSoup to parse some html. Here is my code so far:

from bs4 import BeautifulSoup

class SecureTokenParser:

    @staticmethod
    def parse_secure_token_from_html_response(html_response):
        soup = BeautifulSoup(html_response, 'html.parser')
        for script_tag in soup.find_all("script", type="text/javascript"):
            print(script_tag)

I know it's not much, but I figured it was a good starting point to print the contents to the terminal. How can I use regex to parse out the gSecureToken and then load it as JSON?

You won't show us what print() displays, but imagine it resembles s below.

Use this to parse it:

import re


def parse_token(s: str):
    token_re = re.compile(r'"gSecureToken": "(\w{40})"')
    m = token_re.search(s)
    return m.group(1)


s = '{"url": "/apps_home/eject/", "type": "POST", "data": {"gSecureToken": "7b9854390a079b03cce068b577cd9af6686826b8"}, "dataType": "json"}'
print(parse_token(s))
print(dict(data=dict(gSecureToken=parse_token(s))))

Feel free to use \\w+ if a fixed 40 characters is too restrictive. The man page is at: https://docs.python.org/3/library/re.html

Your "... and then load it as JSON?" remark doesn't appear to be relevant, since by demanding we parse with a regex it looks like there are no parsing tasks leftover for JSON to attend to. (I would have probably started with json.loads() from the get-go, rather than using a regex, since the data appears to be JSON formatted.)

A non-regex, non-BS4 option:

html_response = [your string above]

splt = html_string.split(' : { ')
splt[1].split('},\n')[0]

Output:

'gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" '

No need to reply on a large package like BeautifulSoup for this; you can easily parse out the value of gSecureToken using just the Python re package.

I'm assuming you want to parse out just the value of the gSecureToken . Then, you can create a regular expression pattern:

import re

pattern = r'{\s*gSecureToken\s*:\s*"([a-z0-9]+)"\s*}'

Then, we can use, for example, your test string:

test_str = """
$("#ejectButton").on("click", function(e) {
            $("#ejectButton").prop("disabled", true);
            $.ajax({
                url : "/apps_home/eject/",
                type : "POST",
                data : { gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" },
                dataType : "json",
                success : function(data, textStatus, xhr) {
                    $("#smbStatus").html('');
                    $("#smbEnable").removeClass('greenColor').html('OFF');
                    showPopup("MiFi Share", "<p>Eject completed. It is now safe to remove your USB storage device.</p>");
                },
                error : function(xhr, textStatus, errorThrown) {
                    //undoChange($toggleSwitchElement);
                    // If auth session has ended, force a new login with a fresh GET.
                    if( (xhr.status == 401) || (xhr.status == 403) || (xhr.status == 406) ) window.location.replace(window.location.href);
                }
            });
"""

And finally we can search the test string for our regular expression:

match = re.search(pattern, test_str)
matching_string = match.groups()[0]
print(matching_string)

Which gives us the value desired:

7b9854390a079b03cce068b577cd9af6686826b8

You can see why this regular expression works by visiting this link: www.regexr.com/4ihpd

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM