使用 Python 从不同的域 url 抓取文本数据

Question

Is there any way to scrape only the text data from different domain urls in Python?有没有办法只从 Python 中的不同域 url 中抓取文本数据？

For example in this website the text is in a different block than in this page.例如，在本网站中，文本位于与本页面不同的块中。 I would like to write a single function that would allow me to scrape the text from both these websites at the same time.我想写一个 function 可以让我同时从这两个网站上抓取文本。 Is that possible in Python? Python 有可能吗？

Answer 1

The only possible thing in python is to scrape the whole text of a page. python中唯一可能的事情是抓取页面的整个文本。 You can do that using that code.您可以使用该代码执行此操作。

import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.businessinsider.in/tech/news/airbnb-is-getting-ripped-apart-for-asking-renters-to-donate-money-to-landlords/articleshow/76968577.cms')
soup = BeautifulSoup(r.text, 'html.parser')
texet = soup.find('html').text
print(texet)

使用 Python 从不同的域 url 抓取文本数据

问题描述

1 个解决方案

解决方案1
0 2020-07-15 12:43:38

使用 Python 从不同的域 url 抓取文本数据

问题描述

1 个解决方案

解决方案1 0 2020-07-15 12:43:38

解决方案1
0 2020-07-15 12:43:38