简体   繁体   中英

how to grab from JSON in selenium python

My page returns JSON http response which contains id: 14

Is there a way in selenium python to grab this? I searched the web and could not find any solutions. Now I am wondering maybe its just not possible? I could grab this id from the db but I am trying to avoid this. Please tell me if there is any ways around. Thank you

The source of your difficulty is the fact that when a browser is returned raw JSON data, it wraps it in a tiny bit of HTML to make it visible to the user on the screen.

When I visit https://httpbin.org/user-agent in Firefox, for example, the following raw JSON appears in my browser window:

{"user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0"
}

But in fact Firefox (and Chrome) has wrapped the JSON in a bit of extra HTML in order to create a document it can actually display. Here is the HTML that Firefox wraps it in, which I can see right in the JavaScript console by evaluating the expression document.documentElement.innerHTML :

<head><link rel="alternate stylesheet" type="text/css"
 href="resource://gre-resources/plaintext.css" title="Wrap Long Lines"></head>
 <body><pre>{"user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:42.0)
 Gecko/20100101 Firefox/42.0"
}
</pre></body>

Using BeautifulSoup to parse the HTML, as suggested in another answer, has two serious disadvantages: it introduces a new dependency to your project, and will also be quite slow compared to taking advantage of the fact that the browser will already have parsed the HTML for you and have the resulting DOM ready for your use.

To ask the browser to extract the JSON for you, simply ask it for the text inside of the <body> element, and all of the extra structure that the browser has added will be excluded and the pure JSON be returned:

driver.find_element_by_tag_name('body').text

Or, if you want it parsed into a Python data structure:

import json
json.loads(driver.find_element_by_tag_name('body').text)

You can use BeautifulSoup to parse the page and extract the json. The code you need should look something like this. You may need to change the soup.find command if the json isn't directly in the body of the response.

from bs4 import BeautifulSoup
import json

soup = BeautifulSoup(driver.page_source)
dict_from_json = json.loads(soup.find("body").text)

The other solutions didn't work for me. I found this solution using requests to be fast and simple:

import requests
requests.get(browser.current_url).json()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM