简体   繁体   English

如何使用python从html个元素中提取信息

[英]How to extract information from html elements using python

I would like to have a python script that will extract the information from the html link elements href.我想要一个 python 脚本,它将从 html 链接元素 href 中提取信息。 Here is the example html code:这是示例 html 代码:

<link rel="icon" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('4c33088a-08e8-4422-9f3d-ed65411889ef')/Products('Quicklook')/$value"/>
<id>4c33088a-08e8-4422-9f3d-ed65411889ef</id>

The https link in the href attribute is a download link and I would like to have python script that will automate the download. href 属性中的 https 链接是一个下载链接,我想要 python 脚本来自动下载。 I tried data extraction with selenium and request libraries but I could not make any progress.我尝试使用 selenium 提取数据并请求库,但我无法取得任何进展。 Any chance that I can find a solution?我有机会找到解决方案吗?

Thank you for your answers.谢谢您的回答。

This Article might help https://www.tutorialspoint.com/downloading-files-from-web-using-python it's a very simple solution using the requests package to get the content of the website you want to then download it.这篇文章可能会有所帮助https://www.tutorialspoint.com/downloading-files-from-web-using-python这是一个非常简单的解决方案,使用请求 package 获取您要下载的网站内容。 Here's a little example from the tutorial:这是教程中的一个小例子:

import requests

url = 'https://www.facebook.com/favicon.ico'
r = requests.get(url, allow_redirects=True)
open('facebook.ico', 'wb').write(r.content)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM