简体   繁体   English

当使用请求和beautifulsoup加载更多内容时,我无法抓取下一页上url不会更改的网站

[英]I can't scrape a website where url not change on its next page when load more using requests and beautifulsoup

Python Beatifulsoup requests Python Beatifulsoup 请求

import requests
import re
import os
import csv
from bs4 import BeautifulSoup





for d in searche:
    truelink = d.replace(" ","-")
    truelinkk=('https://www.fb.com

    r = requests.get(truelinkk,headers=headers).text
    soup=BeautifulSoup(r,'lxml')
    mobile=soup.find_all('li',class_='EIR5N')
 

I am beginner to python.我是python的初学者。 I can't scrape a website where url doesn't change on its next page when load more using requests and beautifulsoup please can someone visit the site let me know the procedure for scraping above websites using beautifulsoup and requests.当使用请求和beautifulsoup 加载更多内容时,我无法抓取网址在下一页上不会更改的网站,请有人访问该网站让我知道使用beautifulsoup 和请求抓取上述网站的程序。 Any answer would be appreciated Thankyou Please look this link https://www.olx.in/hyderabad_g4058526/q-Note-9-max-pro?isSearchCall=true任何答案将不胜感激谢谢请查看此链接https://www.olx.in/hyderabad_g4058526/q-Note-9-max-pro?isSearchCall=true

You can use selenium in headless mode instead of requests .您可以在无头模式下使用 selenium 而不是requests Eventho selenium is used for web automation it can help you in this case. Eventho selenium 用于网络自动化,它可以在这种情况下为您提供帮助。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

begin = time.time()

options = Options()
options.headless = True
options.add_argument('--log-level=3')
driver = webdriver.Chrome(options=options)

Since the URL doesn't change you have to click on the button that you want by getting its xpath and:由于 URL 不会更改,因此您必须通过获取其 xpath 并单击所需的按钮:

driver.find_element_by_xpath('xpath code').click()

You can avoid using requests and you can get the source code of the page by using:您可以避免使用请求,您可以使用以下方法获取页面的源代码:

html_text = driver.page_source
soup = BeautifulSoup(html_text, 'lxml')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我正在尝试使用 python 请求来抓取网站,该请求在单击加载时不会更改其链接,我尝试了更多但看起来它的 json - I am trying to scrape a website using python request that doesn't change its link when click on load more i tried but looks its json 如果下一页使用 java 脚本加载,我如何使用 python 抓取下一页数据,没有 URL 更改? - How can i scrape next page data with python if next page load with java script, no URL change? 如何使用 BeautifulSoup 抓取下一页? - How can I scrape the next page using BeautifulSoup? 我正在尝试使用 beautifulsoup4 抓取网站并请求库 - I am trying to scrape a website using beautifulsoup4 and requests library 无法使用请求抓取 graphql 页面 - Can't scrape a graphql page using requests 当我尝试使用 BeautifulSoup 从网站抓取时缺少文本 - Text is missing when I try to scrape from a website using BeautifulSoup 如何使用 Beautifulsoup 从网站上获取产品价格? - How can I scrape a product price from a website using Beautifulsoup? 如何使用 BeautifulSoup 抓取网站 - How Can I Scrape a Website with BeautifulSoup Beautifulsoup/Selenium 如何抓取网站直到下一页被禁用? - Beautifulsoup/Selenium how to scrape website until next page is disabled? 无法使用请求从下一页抓取名称 - Can't scrape names from next pages using requests
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM