简体繁体 English

从结构不同的多个 URL 中抓取信息

[英]Scraping Information from multiple URLS that are different in structure

原文 2021-05-11 09:39:33 2 1 python/ web/ web-scraping/ beautifulsoup

I would like to scrape multiple URLS but they are of different nature, such as different company websites with different html backend.我想抓取多个 URL，但它们具有不同的性质，例如具有不同 html 后端的不同公司网站。 Is there a way to do it without coming up with a customised code for each url?有没有办法在不为每个 url 提供自定义代码的情况下做到这一点？

Understand that I can put multiple URLS into a list and loop them了解我可以将多个 URL 放入一个列表并循环它们

1 个解决方案

I fear not, but I am not an expert:-)我不害怕，但我不是专家:-)

I could imagine that it depends on the complexity of the structures.我可以想象这取决于结构的复杂性。 If you want to find a the text "Test" on every website, I coul imagine that soup.body.findAll(text='Test') would return all occurences of "Test" on the website.如果您想在每个网站上找到文本“测试”，我可以想象soup.body.findAll(text='Test')会返回网站上所有出现的“测试”。

I assume you're aware of how to loop through a list here, so that you'd loop through the list of URLS and for each check whether the searched string occurs (maybe you are looking for sth else, ie an "apply" button or "login"?我假设您知道如何在此处循环遍历列表，以便您遍历 URL 列表并检查是否出现了搜索字符串（也许您正在寻找其他东西，即“应用”按钮还是“登录”？

all the best,一切顺利，

从不同的域（主要是）以不同的结构抓取多个单个页面 - Scraping multiple single pages from different domains(mostly) with different structure

从多个 URL 中抓取表格 - Scraping tables from Multiple URLs

使用 Python 将多个 URL 中的不同变量抓取到一个 CSV 文件中 - Scraping different variables from multiple URLs into one single CSV file using Python

循环多个 URL<python scraping issue> （来自同一网站的 2 个不同网址）</python> - Looping multiple URLs <python scraping issue> ( 2 different URL's from same website)

硒刮多个网址 - Selenium scraping with multiple urls

Selenium - Web 抓取相同内容但 xpath 略有不同的多个 url - Selenium - web scraping multiple urls for same contents but slightly different xpaths

从同一网站的多个页面抓取多个网址 - Scraping multiple urls from same website multiple pages

使用 Python 从不同的域 url 抓取文本数据 - Scraping text data from different domain urls using Python

从多个 start_url 顺序抓取导致解析错误 - Sequential scraping from multiple start_urls leading to error in parsing

Python Scrapy - 从多个网站 URL 中抓取数据 - Python Scrapy - Scraping data from multiple website URLs

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从不同的域（主要是）以不同的结构抓取多个单个页面 - Scraping multiple single pages from different domains(mostly) with different structure 从多个 URL 中抓取表格 - Scraping tables from Multiple URLs 使用 Python 将多个 URL 中的不同变量抓取到一个 CSV 文件中 - Scraping different variables from multiple URLs into one single CSV file using Python 循环多个 URL<python scraping issue> （来自同一网站的 2 个不同网址）</python> - Looping multiple URLs <python scraping issue> ( 2 different URL's from same website) 硒刮多个网址 - Selenium scraping with multiple urls Selenium - Web 抓取相同内容但 xpath 略有不同的多个 url - Selenium - web scraping multiple urls for same contents but slightly different xpaths 从同一网站的多个页面抓取多个网址 - Scraping multiple urls from same website multiple pages 使用 Python 从不同的域 url 抓取文本数据 - Scraping text data from different domain urls using Python 从多个 start_url 顺序抓取导致解析错误 - Sequential scraping from multiple start_urls leading to error in parsing Python Scrapy - 从多个网站 URL 中抓取数据 - Python Scrapy - Scraping data from multiple website URLs

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM