繁体 English 中英

在使用请求和 beautifulsoup 抓取页面时接受 cookies

[英]Accepting cookies while scraping page with requests and beautifulsoup

原文 2020-12-06 19:15:35 9 2 python/ web-scraping/ beautifulsoup/ python-requests

我做了一个脚本，在许多不同的页面上跟踪产品的价格。 问题是某些网站使用 cookies，您必须单击接受 cookies 才能看到价格。

这可能无济于事，但这是瑞典语的网站，所以你们中的许多人都不会理解。

如何在 web 刮擦时接受 cookies？

2 个解决方案

没有 cookies 参与请求。 我觉得你不应该在执行 get 或 post 请求时遇到任何问题。

编辑：试试这段代码：

r = requests.get('https://www.google.com/')

with open('test.html', 'w') as f:
    f.write(r.text)
    f.close()

在 web 浏览器中运行test.html文件并尝试查看差异。 test.html是您的代码所看到的，这与普通人在具有完整 GUI 的 web 浏览器中看到的不同。

当你抓取一个网站时，你不必接受那些 cookies。 但是，如果您想接受，则只需单击网站上的“接受按钮”即可。 您可以使用以下方法执行此操作：

点击 Selenium

右键单击网站获取 X-Path 并检查 cookie 按钮。

页面分页/用请求抓取/ BeautifulSoup

[英]Page Pagination/Scraping with Requests/BeautifulSoup

在 python 中接受 cookies 后抓取 web 页面

[英]Scraping web page after accepting cookies in python

使用Python / Requests / BeautifulSoup进行高效的网页抓取

[英]Efficient web page scraping with Python/Requests/BeautifulSoup

使用Beautifulsoup和Requests刮取“ N”页（如何获取真实的页码）

[英]Scraping 'N' pages with Beautifulsoup and Requests (How to obtain the true page number)

BeautifulSoup - 抓论坛页面

[英]BeautifulSoup - scraping a forum page

网页抓取 Python (BeautifulSoup,Requests)

[英]Web Scraping Python (BeautifulSoup,Requests)

python 网页抓取请求和beautifulsoup

[英]python web scraping with requests and beautifulsoup

抓取时激活按钮进入下一页（Python，BeautifulSoup）

[英]Activate button to get to next page while scraping (Python, BeautifulSoup)

使用BeautifulSoup抓取网站时阅读页码

[英]Read the page number while scraping a website using BeautifulSoup

使用BeautifulSoup和Requests解析html页面源时出现内存泄漏

[英]Memory Leak while parsing html page source with BeautifulSoup & Requests

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 页面分页/用请求抓取/ BeautifulSoup 在 python 中接受 cookies 后抓取 web 页面使用Python / Requests / BeautifulSoup进行高效的网页抓取使用Beautifulsoup和Requests刮取“ N”页（如何获取真实的页码） BeautifulSoup - 抓论坛页面网页抓取 Python (BeautifulSoup,Requests) python 网页抓取请求和beautifulsoup 抓取时激活按钮进入下一页（Python，BeautifulSoup）使用BeautifulSoup抓取网站时阅读页码使用BeautifulSoup和Requests解析html页面源时出现内存泄漏

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM