[英]Unable to scrape a piece of static information from a webpage
I've created a script in python to log in a webpage using credentials and then parse a piece of information SIGN OUT
from another link (the script is supposed to get redirected to that link) to make sure I did log in. 我已经在Python创建脚本中使用凭证登录网页,然后分析了条信息SIGN OUT
从另一个链接(该脚本应该重定向到该链接),以确保我没有登录。
I've tried with: 我尝试过:
import requests
from bs4 import BeautifulSoup
url = "https://member.angieslist.com/gateway/platform/v1/session/login"
link = "https://member.angieslist.com/"
payload = {"identifier":"usename","token":"password"}
with requests.Session() as s:
s.post(url,json=payload,headers={
"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36",
"Referer":"https://member.angieslist.com/member/login",
"content-type":"application/json"
})
r = s.get(link,headers={"User-Agent":"Mozilla/5.0"},allow_redirects=True)
soup = BeautifulSoup(r.text,"lxml")
login_stat = soup.select_one("button[class*='menu-item--account']").text
print(login_stat)
When i run the above script, I get AttributeError: 'NoneType' object has no attribute 'text'
this error which means I went somewhere wrong in my log in process as the information I wish to parse SIGN OUT
is a static content. 当我运行上面的脚本,我得到AttributeError: 'NoneType' object has no attribute 'text'
这个错误,这意味着我去什么地方错了我的登录过程,我想分析的信息SIGN OUT
是一个静态的内容。
How can I parse this SIGN OUT
information from that webpage? 我如何解析这个SIGN OUT
从网页信息?
This website requires JavaScript to work with. 该网站需要使用JavaScript。 Though you generate the login token correctly from the login API, but when you go to the home page, it make multiple additional API calls and then updates the page. 尽管您可以通过登录API正确生成登录令牌,但是当您转到主页时,它将进行多个其他API调用,然后更新页面。
So the issue has nothing to do with login not working. 因此,问题与登录不起作用无关。 You need to use something like selenium for this 您需要为此使用诸如硒之类的东西
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://member.angieslist.com/member/login")
driver.find_element_by_name("email").send_keys("none@getnada.com")
driver.find_element_by_name("password").send_keys("NUN@123456")
driver.find_element_by_id("login--login-button").click()
import time
time.sleep(3)
soup = BeautifulSoup(driver.page_source,"lxml")
login_stat = soup.select("[id*='menu-item']")
for item in login_stat:
print(item.text)
print(login_stat)
driver.quit()
I have mixed bs4
and selenium
here to get it easy for you but you can use just selenium
as well if you want 我在这里混合了bs4
和selenium
以便于您使用,但是如果您愿意,也可以只使用selenium
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.