繁体 English 中英

从 html doc 获取非英文文本

[英]Gettin non-english text from html doc

原文 2022-07-19 07:34:34 4 1 python/ html/ utf-8

我试图在 python 中获取 html 文档的标题，但是得到了奇怪的符号。 我猜那是因为编码，但是 utf-8 编码的 html 文档。 有什么办法可以得到正常的信件吗？

这是代码，我得到了什么：

from bs4 import BeautifulSoup

 with open("index.html") as file:
     src = file.read()


soup = BeautifulSoup(src, "lxml")

title = soup.title.text

print(title)

Р“Р»Р°РІРЅР°СЏ СЃС‚СЂР°РЅРёС†Р°

1 个解决方案

打开文件时需要指定编码类型：

 with open("index.html", encoding='utf-8') as file:
     src = file.read()

从文件中分离英文文本和非英文文本

[英]Separating English text and non-English Text from a file

使用 Python 从文本中删除非英语单词

[英]Removing non-English words from text using Python

Python - 从文件中将八进制转换为非英语文本

[英]Python - Convert Octal to non-English Text from file

使用 langdetect 删除非英文文本

[英]Dropping non-English text with langdetect

在python 2.7中处理非英语文本

[英]processing non-english text in python 2.7

如何存储非英文文本？

[英]How to store non-english text?

如何从文本文件中读取非英语文本并在 python 中打印？

[英]How to read a non-English language text from a text file and print it in python?

带有 pytextrank 的 spacy-udpipe 从非英文文本中提取关键字

[英]spacy-udpipe with pytextrank to extract keywords from non-English text

MoviePy 无法正确显示非英文文本

[英]MoviePy cannot display non-English text properly

如何让python的argparse生成非英文文本？

[英]How to make python's argparse generate Non-English text?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从文件中分离英文文本和非英文文本使用 Python 从文本中删除非英语单词 Python - 从文件中将八进制转换为非英语文本使用 langdetect 删除非英文文本在python 2.7中处理非英语文本如何存储非英文文本？如何从文本文件中读取非英语文本并在 python 中打印？带有 pytextrank 的 spacy-udpipe 从非英文文本中提取关键字 MoviePy 无法正确显示非英文文本如何让python的argparse生成非英文文本？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM