简体   繁体   English

刮取需要使用Python登录的Javascript呈现页面

[英]Scraping Javascript-rendered page that requires login using Python

My issue is that I can't scrape a website that uses login when it renders the page using Javascript. 我的问题是,当使用Javascript呈现页面时,我无法刮擦使用登录名的网站。

I can easily log in using this code: 我可以使用以下代码轻松登录:

import requests
from lxml import html

payload ={
    "username":"username",
    "password":"password"
}
session_requests = requests.session()
result = session_requests.get(login_url)
tree = html.fromstring(result.text)
result = session_requests.post(
    login_url,
    data = payload,
    headers = dict(referer=login_url)
)

Then I can get some values using this code: 然后,我可以使用以下代码获取一些值:

result = session_requests.get(agent_url, headers = dict(referer = agent_url ))
tree = html.fromstring(result.content)
needed_info = tree.xpath("//div[@class='col-md-6']/div[@class='table-responsive']/table/tbody/tr[22]/td[2]")[0].text

However, not everything is rendered. 但是,并不是所有内容都呈现出来。

I've also tried to use dryscrape, however, it does not work on Windows. 我也尝试过使用dryscrape,但是,它在Windows上不起作用。 Selenium is just too heavy for my needs and I'm having issues installing Spynner (probably because it does not support Python 3.6?) Selenium太重了,无法满足我的需求,我在安装Spynner时遇到问题(可能是因为它不支持Python 3.6?)

What would you recommend? 你会推荐什么?

I just went and did it using selenium. 我刚去用硒做过。 Everything else was just too much of a hassle for this little project. 对于这个小项目,其他所有事情都太麻烦了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM