简体   繁体   English

使用 Python 从不同的域 url 抓取文本数据

[英]Scraping text data from different domain urls using Python

Is there any way to scrape only the text data from different domain urls in Python?有没有办法只从 Python 中的不同域 url 中抓取文本数据?

For example in this website the text is in a different block than in this page.例如,在网站中,文本位于与页面不同的块中。 I would like to write a single function that would allow me to scrape the text from both these websites at the same time.我想写一个 function 可以让我同时从这两个网站上抓取文本。 Is that possible in Python? Python 有可能吗?

The only possible thing in python is to scrape the whole text of a page. python中唯一可能的事情是抓取页面的整个文本。 You can do that using that code.您可以使用该代码执行此操作。

import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.businessinsider.in/tech/news/airbnb-is-getting-ripped-apart-for-asking-renters-to-donate-money-to-landlords/articleshow/76968577.cms')
soup = BeautifulSoup(r.text, 'html.parser')
texet = soup.find('html').text
print(texet)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM