如何在firefox插件sdk 1.10 main.js中使用nsIParserUtils？

Question

My recent submission for Firefox add-on site (based on Firefox Add-on SDK 1.10) was rejected because I have not sanitized the input I use and was suggested to use nsIParserUtils . 我最近提交的Firefox附加站点（基于Firefox Add-on SDK 1.10）被拒绝，因为我没有清理我使用的输入，并建议使用nsIParserUtils 。

I found the function parseHTML(doc, html, allowStyle, baseURI, isXML) in that page. 我在该页面中找到了函数parseHTML(doc, html, allowStyle, baseURI, isXML) 。 I changed it to: 我改成了：

function parseHTML(doc, html, allowStyle, baseURI, isXML) {
    var parser = Cc["@mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils);
    var f =  parser.parseFragment(html, allowStyle ? parser.SanitizerAllowStyle : 0,
                                        !!isXML, baseURI, doc);
    return f;
}

And the first parameter in that is said to be a document element. 并且其中的第一个参数被称为文档元素。 I have no idea what that is supposed to be? 我不知道应该是什么？ I tried document.createDocumentFragment() but I get "ReferenceError: document is not defined" error. 我尝试了document.createDocumentFragment()但是我得到“ReferenceError：document not defined”错误。 Can some one help me on how to call this function? 有人可以帮助我如何调用此功能？

And the function returns an nsIDOMDocumentFragment . 该函数返回一个nsIDOMDocumentFragment 。 How to convert that back to a string? 如何将其转换回字符串？

UPDATE: 更新：

As suggested by @zer0 I used: 正如@ zer0所建议我使用的：

var parser = Cc["@mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils);
var sanitizedHTML = parser.sanitize(html, flags);

But it defeats the purpose of what I wanted to do. 但它违背了我想做的目的。 For example: 例如：

<html><head><BASE href='http://localhost/t/h.html' />
<link rel="stylesheet" type="text/css" href="h.css">
<style type="text/css">
.b{
    color:green;
}
</style>
<base href="http://foo.example.com/">
</head><body>Sample Text. No Style
<script>Hello malicious code</script>
<p class="a">External Style</p>
<p class="b">Internal Style</p>
<p style="color:blue">Inline Style</p>

<a href="sample.html">Link</a><br><br><div style='color: #666666; font-size: 12px'>Clipped on 6-October-2012, 07:37:39 PM from <a href='http://localhost/t/h.html'>http://localhost/t/h.html</a> </div></body></html>

Is converted to: 转换为：

<html><head>  


<style type="text/css">
.b{

    color:green;
}
</style>



</head><body>Sample Text. No Style

<p class="a">External Style</p>
<p class="b">Internal Style</p>
<p style="color:blue">Inline Style</p>

<a>Link</a><br><br><div style="color: #666666; font-size: 12px">Clipped on 6-October-2012, 07:37:39 PM from <a href="http://localhost/t/h.html">http://localhost/t/h.html</a> </div></body></html>

As this strips the external hyperlinks and CSS, it defeats the purpose of the add-on itself. 由于这剥离了外部超链接和CSS，它违背了附加组件本身的目的。 What I want is for just the scripts to be removed: 我想要的只是删除脚本：

<html><head><BASE href='http://localhost/t/h.html' /> <BASE href='http://localhost/t/h.html' /> 
<link rel="stylesheet" type="text/css" href="h.css">

<style type="text/css">
.b{

    color:green;
}
</style>
<base href="http://foo.example.com/">


</head><body>Sample Text. No Style
<p class="a">External Style</p>
<p class="b">Internal Style</p>
<p style="color:blue">Inline Style</p>

<a href="sample.html">Link</a><br><br><div style='color: #666666; font-size: 12px'>Clipped on 6-October-2012, 07:37:39 PM from <a href='http://localhost/t/h.html'>http://localhost/t/h.html</a> </div></body></html>

Can someone shed some light on this? 有人可以对此有所了解吗？

Answer 1

Links to external styles are removed for a reason: external styles cannot be validated and they might be dangerous (in particular, -moz-binding can be used to run code). 由于某种原因，外部样式的链接被删除：外部样式无法验证，并且它们可能很危险（特别是， -moz-binding可用于运行代码）。 Also, the assumption is that you could put the HTML code into a location where following relative links isn't safe (such as mail messages in Thunderbird). 此外，假设您可以将HTML代码放入以下相对链接不安全的位置（例如Thunderbird中的邮件消息）。 Absolute links are always fine however. 绝对链接总是很好。

What you might want to do is preprocessing the HTML code to remove these issues - resolve relative links and inline references to external styles. 您可能想要做的是预处理HTML代码以消除这些问题 - 解析相对链接和内部对外部样式的引用。 Something like this: 像这样的东西：

// Parse the HTML code into a temporary document
var doc = Cc["@mozilla.org/xmlextras/domparser;1"]
               .createInstance(Ci.nsIDOMParser)
               .parseFromString(html, "text/html");

// Make sure all links are absolute
for (var i = 0; i < doc.links.length; i++)
    doc.links[i].setAttribute("href", doc.links[i].href);

// Make sure all stylesheets are inlined
var stylesheets = doc.getElementsByTagName("link");
for (i = 0; i < stylesheets.length; i++)
{
    try
    {
        var request = new XMLHttpRequest();
        request.open("GET", stylesheets[i].href, false);
        request.send(null);
        var style = doc.createElement("style");
        style.setAttribute("type", "text/css");
        style.textContent = request.responseText;
        stylesheets[i].parentNode.replaceChild(style, stylesheets[i]);
        i--;
    }
    catch (e)
    {
        // Ignore download errors
    }
}

// Serialize the document into a string again
html = Cc["@mozilla.org/xmlextras/xmlserializer;1"]
         .createInstance(Ci.nsIDOMSerializer)
         .serializeToString(doc.documentElement);

// Now sanizite the HTML code
var parser = Cc["@mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils);
var sanitizedHTML = parser.sanitize(html, parser.SanitizerAllowStyle);

Note that I used a synchronous XMLHttpRequest to download stylesheet contents - this has been done for simplicity, your final code should use asynchronous downloads (most likely via request module) that will not hang the user interface. 请注意，我使用同步XMLHttpRequest来下载样式表内容 - 这是为了简单起见，您的最终代码应该使用不会挂起用户界面的异步下载（很可能通过request模块）。

Answer 2

And the first parameter in that is said to be a document element. 并且其中的第一个参数被称为文档元素。 I have no idea what that is suppose to be? 我不知道那是什么意思？

You don't need that. 你不需要那个。 Just use nsIParserUtils.sanitize method, that just get as input a string and returns as output the sanitized version: 只需使用nsIParserUtils.sanitize方法，只需获取字符串作为输入并返回已清理版本的输出：

var parser = Cc["@mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils);
var sanitizedHTML = parser.sanitize(html, flags);

Check on the link above the section "Constants" to see which flags you need to have in your scenario. 检查“常量”部分上方的链接，以查看您的方案中需要具有哪些标志。

如何在firefox插件sdk 1.10 main.js中使用nsIParserUtils？

问题描述

2 个解决方案

解决方案1
3 已采纳 2012-10-12 13:39:31

解决方案2
2 2012-10-06 10:11:02

如何在firefox插件sdk 1.10 main.js中使用nsIParserUtils？

问题描述

2 个解决方案

解决方案1 3 已采纳 2012-10-12 13:39:31

解决方案2 2 2012-10-06 10:11:02

解决方案1
3 已采纳 2012-10-12 13:39:31

解决方案2
2 2012-10-06 10:11:02