简体   繁体   English

Python-BeautifulSoup-HTML中的德语字符

[英]Python - BeautifulSoup - German characters in html

Dear friendly python experts, 亲爱的Python专家,

I am using BeautifulSoup to scrape some html text from a site. 我正在使用BeautifulSoup从网站上抓取一些html文本。 This site contains German words, such as "Groß" or "Bär". 该站点包含德语单词,例如“Groß”或“Bär”。 When I print the html text these characters get translated quite nasty making it too hard to search the html text for the words then. 当我打印html文本时,这些字符被翻译得很脏,从而很难在html文本中搜索单词。

How can I replace ß to ss , ä to ae , ü to, ö to oe , in the html text? 我怎么能取代SS SSAAE,ü,öOE,在HTML文本?

I was looking for a solution everywhere to this, however it got me nowhere, except confusion land 我一直在寻找解决这个问题的方法,但是除了混乱的土地,它一无所获

As this is a project help is very much appreciated! 由于这是一个项目的帮助,非常感谢!

在读取时,将文本分配给变量并对其进行解码,就像您的文本存储在变量Var ,则在读取时使用Var.decode("utf-8")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM