简体   繁体   English

将现有标签附加到汤会导致尖括号变成 HTML 实体

[英]Appending an existing tag to soup causes angled brackets to become HTML entities

I'm trying to write a BeautifulSoup object to a file.我正在尝试将 BeautifulSoup object 写入文件。 Note that I append something to the soup object.注意我的append东西给汤object。 The thing is div containing HTML/JavaScript from Plotly's to_html() function, which gives me a chart in HTML form.事情是包含来自 Plotly 的 to_html() function 的 HTML/JavaScript 的 div,它给了我一个 HTML 形式的图表。 I narrowed down the problem to the following code:我将问题缩小到以下代码:

from bs4 import BeautifulSoup

file_writer = open("path/to/file", "w")
html_outline = """<html>
                      <head></head>
                          <body>
                              <p>Hello World!</p>
                              <div></div>
                          </body>
                      </html>"""
soup = BeautifulSoup(html_outline, "html.parser")
soup.div.append({plotly HTML/JavaScript})
file_writer.write(soup)
file_writer.close()

Inside the write function, I've tried various functions for the soup object to convert it to a string, like str(soup), soup.prettify(), and more that I'm forgetting, and those indeed successfully write to the file, but the angled brackets ("<>") from the Plotly HTML I insert become HTML entities (I believe that's what they're called), so a在写入 function 中,我尝试了多种功能,用于汤 object 将其转换为字符串,如 str(soup)、soup.prettify() 等等,我忘记了,而且那些确实成功写入文件,但是我插入的 Plotly HTML 中的尖括号(“<>”)变成了 HTML 实体(我相信这就是他们所说的),所以

<div>

becomes:变成:

&lt;div&gt;

inside the file I write to.在我写入的文件中。 I will note here that only the angled brackets for the HTML I appended into the soup object turn into HTML entities, the html, head, and body tags are all proper angled brackets. I will note here that only the angled brackets for the HTML I appended into the soup object turn into HTML entities, the html, head, and body tags are all proper angled brackets.

My question is, how can I convert the soup object directly into a string that has proper angled brackets and no HTML entities?我的问题是,如何将汤 object 直接转换为具有正确尖括号且没有 HTML 实体的字符串?

I guess I can maybe write a function that parses the file for those HTML entities and replaces them with proper angled brackets, but I'm hoping there's a better solution before I do that.我想我也许可以写一个 function 来解析那些 HTML 实体的文件并用适当的尖括号替换它们,但我希望在我这样做之前有一个更好的解决方案。 I tried searching this problem up multiple times but nothing came up for it.我尝试多次搜索这个问题,但没有任何结果。

I asked this question previously but it was marked as a duplicate, but the duplicate question linked didn't help because that was for adding empty tags.我之前问过这个问题,但它被标记为重复,但链接的重复问题没有帮助,因为那是为了添加空标签。 I'm appending a whole existing div with JavaScript and other content to my soup object here.我在这里将带有 JavaScript 和其他内容的整个现有 div 附加到我的汤 object 中。

Thanks in advance!提前致谢!

I found out that I was able to use bs4's.prettify() function, but I had to change the formatter to None.我发现我能够使用 bs4's.prettify() function,但我不得不将格式化程序更改为无。 So my line of code that writes the HTML to the file becomes:所以我将 HTML 写入文件的代码行变为:

file_writer.write(soup.prettify(formatter=None))

This isn't best practice because according to bs4's docs , it said that this may generate invalid HTML/XML.这不是最佳实践,因为根据bs4 的文档,它说这可能会生成无效的 HTML/XML。 I know the docs say that it should convert HTML entities to Unicode characters by default, so I'm not sure why that didn't work for me.我知道文档说它应该默认将 HTML 实体转换为 Unicode 字符,所以我不确定为什么这对我不起作用。 While I'm not in urgent need of a solution anymore, I posted this because I thought that someone may find it useful in the future.虽然我不再急需解决方案,但我发布了这个,因为我认为将来有人可能会发现它有用。 Hopefully someone can give a better solution, though!不过,希望有人可以提供更好的解决方案!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM