简体   繁体   English

无法从网页上抓取静态信息

[英]Unable to scrape a piece of static information from a webpage

I've created a script in python to log in a webpage using credentials and then parse a piece of information SIGN OUT from another link (the script is supposed to get redirected to that link) to make sure I did log in. 我已经在Python创建脚本中使用凭证登录网页,然后分析了条信息SIGN OUT从另一个链接(该脚本应该重定向到该链接),以确保我没有登录。

Website address 网站地址

I've tried with: 我尝试过:

import requests
from bs4 import BeautifulSoup

url = "https://member.angieslist.com/gateway/platform/v1/session/login"
link = "https://member.angieslist.com/"

payload = {"identifier":"usename","token":"password"}

with requests.Session() as s:
    s.post(url,json=payload,headers={
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36",
        "Referer":"https://member.angieslist.com/member/login",
        "content-type":"application/json"
        })

    r = s.get(link,headers={"User-Agent":"Mozilla/5.0"},allow_redirects=True)
    soup = BeautifulSoup(r.text,"lxml")
    login_stat = soup.select_one("button[class*='menu-item--account']").text
    print(login_stat)

When i run the above script, I get AttributeError: 'NoneType' object has no attribute 'text' this error which means I went somewhere wrong in my log in process as the information I wish to parse SIGN OUT is a static content. 当我运行上面的脚本,我得到AttributeError: 'NoneType' object has no attribute 'text'这个错误,这意味着我去什么地方错了我的登录过程,我想分析的信息SIGN OUT是一个静态的内容。

How can I parse this SIGN OUT information from that webpage? 我如何解析这个SIGN OUT从网页信息?

This website requires JavaScript to work with. 该网站需要使用JavaScript。 Though you generate the login token correctly from the login API, but when you go to the home page, it make multiple additional API calls and then updates the page. 尽管您可以通过登录API正确生成登录令牌,但是当您转到主页时,它将进行多个其他API调用,然后更新页面。

So the issue has nothing to do with login not working. 因此,问题与登录不起作用无关。 You need to use something like selenium for this 您需要为此使用诸如硒之类的东西

from selenium import  webdriver

driver = webdriver.Chrome()

driver.get("https://member.angieslist.com/member/login")
driver.find_element_by_name("email").send_keys("none@getnada.com")
driver.find_element_by_name("password").send_keys("NUN@123456")
driver.find_element_by_id("login--login-button").click()
import time
time.sleep(3)
soup = BeautifulSoup(driver.page_source,"lxml")
login_stat = soup.select("[id*='menu-item']")

for item in login_stat:
    print(item.text)
print(login_stat)
driver.quit()

I have mixed bs4 and selenium here to get it easy for you but you can use just selenium as well if you want 我在这里混合了bs4selenium以便于您使用,但是如果您愿意,也可以只使用selenium

数据

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM