简体繁体 English

如何从 html 中提取所有文本，不包括 css 和 javascript 在 ZA7F5F35426B9237211FC9231B73 中使用 lxml

[英]How to extract all text from html excluding css and javascript with lxml in Python?

原文 2019-10-17 13:16:44 8 1 python/ xpath/ python-requests/ lxml

How can I extract all text from a html excluding any css and javascript?如何从 html 中提取所有文本，不包括任何 css 和 javascript？

I am trying the following code:我正在尝试以下代码：

r = requests.get(website)
tree = html.fromstring(r.text)
html_text = tree.xpath('//text()')

But it also retrieves all css and javascript content from the website但它也会从网站上检索所有 css 和 javascript 内容

1 个解决方案

You can use the drop_tree() method to remove elements that you are not interested in.您可以使用drop_tree()方法删除您不感兴趣的元素。

tree = html.fromstring(r.text)

unwanted = tree.xpath('//script|//style')
for u in unwanted:
    u.drop_tree()

html_text = tree.xpath('//text()')

如何使用html文件中的lxml在python中提取段落文本？ - How to extract paragraph text in python using lxml from html file?

使用Python和lxml从HTML div提取文本 - Extract Text from HTML div using Python and lxml

如何通过python从html文件中的javascript句子中提取此类文本 - How to extract such text from javascript sentences in a html file by python

使用lxml.html提取文本 - Extract text with lxml.html

使用lxml从某个xml标记中提取所有文本 - Extract all text from a certain xml tag using lxml

如何使用lxml从html解析文本？ - How to parse text from html using lxml?

Python：使用 lxml xpath 从所有 HTML 子元素文本中获取文本 - Python: Get text from all HTML child elements texts with lxml xpath

使用 python 和 lxml 模块从 html 中删除所有 javascript 标签和样式标签 - Remove all javascript tags and style tags from html with python and the lxml module

如何使用lxml xpath和python中的请求在文本中提取href - How to extract the href within the text using lxml xpath and requests in python

如何使用Python从html文本中提取信息 - How to extract information from html text with Python

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用html文件中的lxml在python中提取段落文本？ - How to extract paragraph text in python using lxml from html file? 使用Python和lxml从HTML div提取文本 - Extract Text from HTML div using Python and lxml 如何通过python从html文件中的javascript句子中提取此类文本 - How to extract such text from javascript sentences in a html file by python 使用lxml.html提取文本 - Extract text with lxml.html 使用lxml从某个xml标记中提取所有文本 - Extract all text from a certain xml tag using lxml 如何使用lxml从html解析文本？ - How to parse text from html using lxml? Python：使用 lxml xpath 从所有 HTML 子元素文本中获取文本 - Python: Get text from all HTML child elements texts with lxml xpath 使用 python 和 lxml 模块从 html 中删除所有 javascript 标签和样式标签 - Remove all javascript tags and style tags from html with python and the lxml module 如何使用lxml xpath和python中的请求在文本中提取href - How to extract the href within the text using lxml xpath and requests in python 如何使用Python从html文本中提取信息 - How to extract information from html text with Python

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM