简体   繁体   English

使用正则表达式解析字符串python3

[英]Using regex to parse string python3

I am trying to access gSecureToken from the following string: 我正在尝试从以下字符串访问gSecureToken

$("#ejectButton").on("click", function(e) {
            $("#ejectButton").prop("disabled", true);
            $.ajax({
                url : "/apps_home/eject/",
                type : "POST",
                data : { gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" },
                dataType : "json",
                success : function(data, textStatus, xhr) {
                    $("#smbStatus").html('');
                    $("#smbEnable").removeClass('greenColor').html('OFF');
                    showPopup("MiFi Share", "<p>Eject completed. It is now safe to remove your USB storage device.</p>");
                },
                error : function(xhr, textStatus, errorThrown) {
                    //undoChange($toggleSwitchElement);
                    // If auth session has ended, force a new login with a fresh GET.
                    if( (xhr.status == 401) || (xhr.status == 403) || (xhr.status == 406) ) window.location.replace(window.location.href);
                }
            });

How can I use regex to parse the value out of the string? 如何使用正则表达式从字符串中解析出值? I know once I have it parsed I will be able to load it as JSON. 我知道解析后便可以将其加载为JSON。

My current code doesn't use an regex, it just deals with using BeautifulSoup to parse some html. 我当前的代码不使用正则表达式,而只是使用BeautifulSoup解析一些html。 Here is my code so far: 到目前为止,这是我的代码:

from bs4 import BeautifulSoup

class SecureTokenParser:

    @staticmethod
    def parse_secure_token_from_html_response(html_response):
        soup = BeautifulSoup(html_response, 'html.parser')
        for script_tag in soup.find_all("script", type="text/javascript"):
            print(script_tag)

I know it's not much, but I figured it was a good starting point to print the contents to the terminal. 我知道不多,但是我认为这是将内容打印到终端的一个很好的起点。 How can I use regex to parse out the gSecureToken and then load it as JSON? 如何使用正则表达式解析gSecureToken ,然后将其加载为JSON?

You won't show us what print() displays, but imagine it resembles s below. 你不会告诉我们什么print()显示器,但是想象一下,它类似于s以下。

Use this to parse it: 使用它来解析它:

import re


def parse_token(s: str):
    token_re = re.compile(r'"gSecureToken": "(\w{40})"')
    m = token_re.search(s)
    return m.group(1)


s = '{"url": "/apps_home/eject/", "type": "POST", "data": {"gSecureToken": "7b9854390a079b03cce068b577cd9af6686826b8"}, "dataType": "json"}'
print(parse_token(s))
print(dict(data=dict(gSecureToken=parse_token(s))))

Feel free to use \\w+ if a fixed 40 characters is too restrictive. 如果固定的40个字符限制太小,请随意使用\\w+ The man page is at: https://docs.python.org/3/library/re.html 手册页位于: https : //docs.python.org/3/library/re.html

Your "... and then load it as JSON?" 您的“ ...然后将其加载为JSON?” remark doesn't appear to be relevant, since by demanding we parse with a regex it looks like there are no parsing tasks leftover for JSON to attend to. 备注似乎无关紧要,因为通过要求我们使用正则表达式进行解析,看起来好像没有剩余的解析任务可供JSON处理。 (I would have probably started with json.loads() from the get-go, rather than using a regex, since the data appears to be JSON formatted.) (我可能从一开始就从json.loads()开始,而不是使用正则表达式,因为数据似乎是JSON格式的。)

A non-regex, non-BS4 option: 非正则表达式,非BS4选项:

html_response = [your string above]

splt = html_string.split(' : { ')
splt[1].split('},\n')[0]

Output: 输出:

'gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" ' 'gSecureToken:“ 7b9854390a079b03cce068b577cd9af6686826b8”'

No need to reply on a large package like BeautifulSoup for this; 不需要像BeautifulSoup这样的大型程序包就此答复; you can easily parse out the value of gSecureToken using just the Python re package. 您可以仅使用Python re包轻松解析gSecureToken的值。

I'm assuming you want to parse out just the value of the gSecureToken . 我假设您只想解析gSecureToken的值。 Then, you can create a regular expression pattern: 然后,您可以创建一个正则表达式模式:

import re

pattern = r'{\s*gSecureToken\s*:\s*"([a-z0-9]+)"\s*}'

Then, we can use, for example, your test string: 然后,我们可以使用例如您的测试字符串:

test_str = """
$("#ejectButton").on("click", function(e) {
            $("#ejectButton").prop("disabled", true);
            $.ajax({
                url : "/apps_home/eject/",
                type : "POST",
                data : { gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" },
                dataType : "json",
                success : function(data, textStatus, xhr) {
                    $("#smbStatus").html('');
                    $("#smbEnable").removeClass('greenColor').html('OFF');
                    showPopup("MiFi Share", "<p>Eject completed. It is now safe to remove your USB storage device.</p>");
                },
                error : function(xhr, textStatus, errorThrown) {
                    //undoChange($toggleSwitchElement);
                    // If auth session has ended, force a new login with a fresh GET.
                    if( (xhr.status == 401) || (xhr.status == 403) || (xhr.status == 406) ) window.location.replace(window.location.href);
                }
            });
"""

And finally we can search the test string for our regular expression: 最后,我们可以在测试字符串中搜索正则表达式:

match = re.search(pattern, test_str)
matching_string = match.groups()[0]
print(matching_string)

Which gives us the value desired: 这给了我们所需的值:

7b9854390a079b03cce068b577cd9af6686826b8

You can see why this regular expression works by visiting this link: www.regexr.com/4ihpd 您可以通过以下链接查看此正则表达式为何起作用:www.regexr.com/4ihpd

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM