简体繁体 English

python html提取标签

[英]python html extract tags

原文 2010-08-17 14:12:20 8 2 python/ html

How would it be possible to do the following: 如何执行以下操作：

Scan through an html page (preferably through a whole domain (www.python.org) and extract all 扫描html页面（最好遍及整个域（www.python.org））并提取所有

h1 h2 ...hn Tags h1 h2 ... hn标签

and write all Headings to a file. 并将所有标题写入文件。 In the correct order: 按照正确的顺序：

Start with h1 Than h2 从h1开始比h2

until we reach the next h1 直到我们到达下一个h1

2 个解决方案

使用BeautifulSoup或PyQuery 。

Given the requirement to scan a whole website, you might want to look into pycurl to grab the files to scrape. 鉴于需要扫描整个网站，您可能需要研究pycurl来抓取要抓取的文件。 Be careful not to hit the site with the equivalent of a DoS attack though. 但是请注意，不要以与DoS攻击相当的方式访问该站点。

Python 正则表达式提取 HTML 标签中的文本 - Python Regex Extract Text Within HTML Tags

如何使用Python BeautifulSoup提取td HTML标签？ - How to extract td HTML tags with Python BeautifulSoup?

如何使用python从html标签中提取突出显示的信息？ - How to extract the highlighted info from the html tags using python?

使用python美丽汤从html提取特定标签 - extract specific tags from html using python beautiful soup

如何通过Python从以下HTML提取标签 - How to extract tags from the following HTML via Python

在python中使用BeautifulSoup提取html标签之间的数据 - extract data between html tags using BeautifulSoup in python

使用beautifulsoup python从标记中提取html数据 - Extract html data from tags using beautifulsoup python

使用美丽的汤python从html标记中提取信息 - Extract information from html tags using beautiful soup python

如何在python中两个不同标签之间提取html？ - How to extract html between two different tags in python?

Python：正则表达式以提取html中任意两个标签之间的文本 - Python: Regular expression to extract text between any two tags in a html

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 正则表达式提取 HTML 标签中的文本 - Python Regex Extract Text Within HTML Tags 如何使用Python BeautifulSoup提取td HTML标签？ - How to extract td HTML tags with Python BeautifulSoup? 如何使用python从html标签中提取突出显示的信息？ - How to extract the highlighted info from the html tags using python? 使用python美丽汤从html提取特定标签 - extract specific tags from html using python beautiful soup 如何通过Python从以下HTML提取标签 - How to extract tags from the following HTML via Python 在python中使用BeautifulSoup提取html标签之间的数据 - extract data between html tags using BeautifulSoup in python 使用beautifulsoup python从标记中提取html数据 - Extract html data from tags using beautifulsoup python 使用美丽的汤python从html标记中提取信息 - Extract information from html tags using beautiful soup python 如何在python中两个不同标签之间提取html？ - How to extract html between two different tags in python? Python：正则表达式以提取html中任意两个标签之间的文本 - Python: Regular expression to extract text between any two tags in a html

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM