简体   繁体   English

如何使用 python 替换 HTML 文件中的 HTML 代码?

[英]how to replace HTML codes in HTML file using python?

I'm trying to replace all HTML codes in my HTML file in a for Loop (not sure if this is the easiest approach) without changing the formatting of the original file.我正在尝试在 for 循环中替换我的 HTML 文件中的所有 HTML 代码(不确定这是否是最简单的方法)而不更改原始文件的格式。 When I run the code below I don't get the codes replaced.当我运行下面的代码时,我没有替换代码。 Does anyone know what could be wrong?有谁知道可能出了什么问题?

import re
tex=open('ALICE.per-txt.txt', 'r')

tex=tex.read()




for i in tex:
  if i =='õ':
      i=='õ'
  elif i == 'ç':
      i=='ç'



with open('Alice1.replaced.txt', "w") as f:
    f.write(tex)
    f.close()

You can usehtml.unescape .您可以使用html.unescape

>>> import html
>>> html.unescape('õ')
'õ'

With your code:使用您的代码:

import html

with open('ALICE.per-txt.txt', 'r') as f:
    html_text = f.read()

html_text = html.unescape(html_text)

with open('ALICE.per-txt.txt', 'w') as f:
    f.write(html_text)

Please note that I opened the files with a with statement.请注意,我使用with语句打开了文件。 This takes care of closing the file after the with block - something you forgot to do when reading the file.这负责在with块之后关闭文件 - 这是您在读取文件时忘记做的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM