简体   繁体   English

在 HTML Blob 中保留元标记的字符集属性?

[英]Preserve charset attribute of meta tag in HTML Blob?

I am generating a client-side HTML redirect like this:我正在生成这样的客户端 HTML 重定向

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Déjà vu - Wikipedia</title>
  <script type='text/javascript'>
  document.addEventListener('DOMContentLoaded', function () {
var newHTML = document.createElement('html');
var newHead = document.createElement('head');
var newMeta = document.createElement('meta');
var newTitle = document.createElement('title');
newTitle.text = "Déjà vu - Wikipedia";
newMeta.httpEquiv = "refresh";
newMeta.charset = "utf-8";
newMeta.content = "30;url=https://en.wikipedia.org/wiki/D%C3%A9j%C3%A0_vu";
var newBody = document.createElement('body');
var newPar = document.createElement('p');
var newText = document.createTextNode('Loading Déjà vu - Wikipedia...');
newPar.appendChild(newText);
newBody.appendChild(newPar);
newHead.appendChild(newMeta);
newHead.appendChild(newTitle);
newHTML.append(newHead);
newHTML.append(newBody);
var tempAnchor = window.document.createElement('a');
HTMLBlob = new Blob([newHTML.outerHTML], {type: 'text/html; charset=UTF-8'});
tempAnchor.href = window.URL.createObjectURL(HTMLBlob);
tempAnchor.download = "example-redirect.html"
tempAnchor.style.display = 'none';
document.body.appendChild(tempAnchor);
tempAnchor.click();
document.body.removeChild(tempAnchor);

  });
  </script>
  </head>
  <body>
  </body>
</html>

However, I am losing the charset meta attribute when I do so.但是,这样做时我会丢失charset元属性。 The output looks like this:输出如下所示:

<html><head><meta http-equiv="refresh" content="30;url=https://en.wikipedia.org/wiki/D%C3%A9j%C3%A0_vu"><title>Déjà vu - Wikipedia</title></head><body><p>Loading Déjà vu - Wikipedia...</p></body></html>

This means that my browser is not sure what encoding to use, and does not display the accents correctly.这意味着我的浏览器不确定要使用什么编码,并且不能正确显示重音。

正在加载 Déjà vu - 维基百科...

This, on the other hand, properly shows the accents:另一方面,这正确地显示了口音:

<html><head><meta http-equiv="refresh" charset="utf-8" content="30;url=https://en.wikipedia.org/wiki/D%C3%A9j%C3%A0_vu"><title>Déjà vu - Wikipedia</title></head><body><p>Loading Déjà vu - Wikipedia...</p></body></html>

正在加载似曾相识 - 维基百科...

I've reduced it down as minimal example as I can, and it still occurs.我已经尽可能地减少了它,它仍然发生。

 <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>title</title> <script type='text/javascript'> document.addEventListener('DOMContentLoaded', function() { var newHTML = document.createElement('html'); var newHead = document.createElement('head'); var newMeta = document.createElement('meta'); newMeta.charset = "utf-8"; newHead.appendChild(newMeta); newHTML.append(newHead); var tempAnchor = window.document.createElement('a'); HTMLBlob = new Blob([newHTML.outerHTML], { type: 'text/html; charset=UTF-8' }); tempAnchor.href = window.URL.createObjectURL(HTMLBlob); tempAnchor.download = "minimal-output.html" tempAnchor.style.display = 'none'; document.body.appendChild(tempAnchor); tempAnchor.click(); document.body.removeChild(tempAnchor); }); </script> </head> <body> </body> </html>

Here is the output:这是输出:

<html><head><meta></head></html>

This occurs in both Firefox 63.0 and Chromium 70.0.这发生在 Firefox 63.0 和 Chromium 70.0 中。 Here is a link to the Git repo:这是指向 Git 存储库的链接:

https://github.com/nbeaver/stackoverflow_question_2018-11-07https://github.com/nbeaver/stackoverflow_question_2018-11-07

How can I preserve the charset attribute of an HTML blob?如何保留 HTML blob 的charset属性?

HTML <meta> elements currently don't have a dedicated DOM interface for setting the charset attribute. HTML <meta>元素目前没有用于设置 charset 属性的专用 DOM 接口。 See the specification: https://www.w3.org/TR/html5/document-metadata.html#the-meta-element .请参阅规范: https : //www.w3.org/TR/html5/document-metadata.html#the-meta-element

newMeta.charset = "utf-8"; only adds your own arbitrary charset property to the newMeta JavaScript object.只将您自己的任意charset属性添加到newMeta JavaScript 对象。 This arbitrary property has no effect on the charset HTML attribute of the <meta> element.这个任意属性对<meta>元素的charset HTML 属性没有影响。

You need to set the charset attribute like this: newMeta.setAttribute("charset", "utf-8");您需要像这样设置字符集属性: newMeta.setAttribute("charset", "utf-8");

According to this answer Set charset meta tag with JavaScript根据这个答案Set charset meta tag with JavaScript

You can't set the charset content attribute by setting the charset property because they don't reflect each other.您无法通过设置 charset 属性来设置 charset 内容属性,因为它们不会相互反映。 In fact there is no property that reflects the charset content attribute.事实上,没有反映字符集内容属性的属性。 [...] The character set is established by the parser, so constructing the meta element in JavaScript after the HTML has been parsed will have no effect on the character set of the document at all. [...] 字符集是由解析器建立的,因此在解析完 HTML 之后在 JavaScript 中构造元元素对文档的字符集完全没有影响。

However, in your case, prepending an UTF-8 BOM header to the blob might do the trick.但是,在您的情况下,将UTF-8 BOM标头添加到 blob 可能会起作用。

HTMLBlob = new Blob(["\ufeff",newHTML.outerHTML], {type: 'text/html; charset=UTF-8'});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM