[英]I can't scrape a website where url not change on its next page when load more using requests and beautifulsoup
Python Beatifulsoup requests Python Beatifulsoup 请求
import requests
import re
import os
import csv
from bs4 import BeautifulSoup
for d in searche:
truelink = d.replace(" ","-")
truelinkk=('https://www.fb.com
r = requests.get(truelinkk,headers=headers).text
soup=BeautifulSoup(r,'lxml')
mobile=soup.find_all('li',class_='EIR5N')
I am beginner to python.我是python的初学者。 I can't scrape a website where url doesn't change on its next page when load more using requests and beautifulsoup please can someone visit the site let me know the procedure for scraping above websites using beautifulsoup and requests.
当使用请求和beautifulsoup 加载更多内容时,我无法抓取网址在下一页上不会更改的网站,请有人访问该网站让我知道使用beautifulsoup 和请求抓取上述网站的程序。 Any answer would be appreciated Thankyou Please look this link https://www.olx.in/hyderabad_g4058526/q-Note-9-max-pro?isSearchCall=true
任何答案将不胜感激谢谢请查看此链接https://www.olx.in/hyderabad_g4058526/q-Note-9-max-pro?isSearchCall=true
You can use selenium in headless mode instead of requests .您可以在无头模式下使用 selenium 而不是requests 。 Eventho selenium is used for web automation it can help you in this case.
Eventho selenium 用于网络自动化,它可以在这种情况下为您提供帮助。
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
begin = time.time()
options = Options()
options.headless = True
options.add_argument('--log-level=3')
driver = webdriver.Chrome(options=options)
Since the URL doesn't change you have to click on the button that you want by getting its xpath and:由于 URL 不会更改,因此您必须通过获取其 xpath 并单击所需的按钮:
driver.find_element_by_xpath('xpath code').click()
You can avoid using requests and you can get the source code of the page by using:您可以避免使用请求,您可以使用以下方法获取页面的源代码:
html_text = driver.page_source
soup = BeautifulSoup(html_text, 'lxml')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.