简体   繁体   中英

Parse JSON from HTML responseText

Python's webob module by default returns text/html responses, specifically ServerErorr's and these end up embedding the error JSON Paylod within the body of the HTML responseText contains the following:

<html>
<head>
  <title>503 Service Unavailable</title>
</head>
<body>
<h1>503 Service Unavailable</h1>
{
    "status": "object-specific error",
    "payload": {
            "Message": "Unable to list resources",
            "HTTP Method": "GET",
            "URI": "api/myManager/1.0/Node",
            "Operation": "LIST",
            "Object": {
                    "Name": "myManager.Node",
                    "Interface": "Node"
            },
            "Version": {
                    "Major": 1,
                    "Minor": 0
            }
       }
}<br /><br />
</body>
</html>

Using Javascript on the client side what is the best approach to extract the JSON that's embedded within the HTML ? What is the best way to extract this JSON object embedded within the HTML ?

So I agree in general that, the better solution is to ensure the server returns only JSON, however a quick means of achieving this via Javascript on the client side as @Barmer suggested, Parse the html to the DOM, get the text childNode inside body and run JSONParse on it.

var responseStr = '<html>' +
                  '<head>' +
                  '  <title>503 Service Unavailable</title>' +
                  '</head>' +
                  '<body>' +
                  '<h1>503 Service Unavailable</h1>' +
                  '{' +
                  '  "status": "object-specific error",' +
                  '  "payload": {' +
                  '    "Message": "Unable to list resources",' +
                  '    "HTTP Method": "GET",' +
                  '    "URI": "api/myManager/1.0/Node",' +
                  '    "Operation": "LIST",' +
                  '    "Object": {' +
                  '      "Name": "myManager.Node",' +
                  '      "Interface": "Node"' +
                  '    },' +
                  '    "Version": {' +
                  '      "Major": 1,' +
                  '      "Minor": 0' +
                  '    }' +
                  '  }' +
                  '}<br /><br />' +
                  '</body>' +
                  '</html>';
var parser = new DOMParser();
var doc = parser.parseFromString(responseStr, "text/html");
var items = doc.body.getElementsByTagName("*");
var json_obj;

for (var i = 0, len = doc.body.childNodes.length; i < len; i++) {
    if (doc.body.childNodes[i].nodeName == "#text") {
        json_obj = JSON.parse(doc.body.childNodes[i].data);
        break;
    }
}

// You can access json directly now e.g.
console.log(json_obj.status);
console.log(json_obj.payload['HTTP Method']);

Using a RegEx to parse (not really reliable but efficient) import re import json

content = """\
<html>
<head>
  <title>503 Service Unavailable</title>
</head>
<body>
<h1>503 Service Unavailable</h1>
{
    "status": "object-specific error",
    "payload": {
            "Message": "Unable to list resources",
            "HTTP Method": "GET",
            "URI": "api/myManager/1.0/Node",
            "Operation": "LIST",
            "Object": {
                    "Name": "myManager.Node",
                    "Interface": "Node"
            },
            "Version": {
                    "Major": 1,
                    "Minor": 0
            }
       }
}<br /><br />
</body>
</html>"""

mo = re.search(r"</h1>(.*?)<br", content, flags=re.DOTALL)
if mo:
    data = mo.group(1)
    obj = json.loads(data)
    print(obj)

You'll get:

{'payload': {'Operation': 'LIST', 'HTTP Method': 'GET',
'URI': 'api/myManager/1.0/Node',
'Message': 'Unable to list resources',
'Version': {'Major': 1, 'Minor': 0},
'Object': {'Interface': 'Node', 'Name': 'myManager.Node'}},
'status': 'object-specific error'}

Or, using lxml :

import json
from lxml import etree

content = """\
<html>
...
</html>"""

tree = etree.XML(content)

h1 = tree.xpath("/html/body/h1[1]")[0]
data = h1.tail
obj = json.loads(data)

Same result

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM