简体   繁体   中英

How to use nsIParserUtils inside firefox addon sdk 1.10 main.js?

My recent submission for Firefox add-on site (based on Firefox Add-on SDK 1.10) was rejected because I have not sanitized the input I use and was suggested to use nsIParserUtils .

I found the function parseHTML(doc, html, allowStyle, baseURI, isXML) in that page. I changed it to:

function parseHTML(doc, html, allowStyle, baseURI, isXML) {
    var parser = Cc["@mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils);
    var f =  parser.parseFragment(html, allowStyle ? parser.SanitizerAllowStyle : 0,
                                        !!isXML, baseURI, doc);
    return f;
}

And the first parameter in that is said to be a document element. I have no idea what that is supposed to be? I tried document.createDocumentFragment() but I get "ReferenceError: document is not defined" error. Can some one help me on how to call this function?

And the function returns an nsIDOMDocumentFragment . How to convert that back to a string?


UPDATE:

As suggested by @zer0 I used:

var parser = Cc["@mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils);
var sanitizedHTML = parser.sanitize(html, flags);

But it defeats the purpose of what I wanted to do. For example:

<html><head><BASE href='http://localhost/t/h.html' />
<link rel="stylesheet" type="text/css" href="h.css">
<style type="text/css">
.b{
    color:green;
}
</style>
<base href="http://foo.example.com/">
</head><body>Sample Text. No Style
<script>Hello malicious code</script>
<p class="a">External Style</p>
<p class="b">Internal Style</p>
<p style="color:blue">Inline Style</p>

<a href="sample.html">Link</a><br><br><div style='color: #666666; font-size: 12px'>Clipped on 6-October-2012, 07:37:39 PM from <a href='http://localhost/t/h.html'>http://localhost/t/h.html</a> </div></body></html>

Is converted to:

<html><head>  


<style type="text/css">
.b{

    color:green;
}
</style>



</head><body>Sample Text. No Style

<p class="a">External Style</p>
<p class="b">Internal Style</p>
<p style="color:blue">Inline Style</p>

<a>Link</a><br><br><div style="color: #666666; font-size: 12px">Clipped on 6-October-2012, 07:37:39 PM from <a href="http://localhost/t/h.html">http://localhost/t/h.html</a> </div></body></html>

As this strips the external hyperlinks and CSS, it defeats the purpose of the add-on itself. What I want is for just the scripts to be removed:

<html><head><BASE href='http://localhost/t/h.html' /> <BASE href='http://localhost/t/h.html' /> 
<link rel="stylesheet" type="text/css" href="h.css">

<style type="text/css">
.b{

    color:green;
}
</style>
<base href="http://foo.example.com/">


</head><body>Sample Text. No Style
<p class="a">External Style</p>
<p class="b">Internal Style</p>
<p style="color:blue">Inline Style</p>

<a href="sample.html">Link</a><br><br><div style='color: #666666; font-size: 12px'>Clipped on 6-October-2012, 07:37:39 PM from <a href='http://localhost/t/h.html'>http://localhost/t/h.html</a> </div></body></html>

Can someone shed some light on this?

Links to external styles are removed for a reason: external styles cannot be validated and they might be dangerous (in particular, -moz-binding can be used to run code). Also, the assumption is that you could put the HTML code into a location where following relative links isn't safe (such as mail messages in Thunderbird). Absolute links are always fine however.

What you might want to do is preprocessing the HTML code to remove these issues - resolve relative links and inline references to external styles. Something like this:

// Parse the HTML code into a temporary document
var doc = Cc["@mozilla.org/xmlextras/domparser;1"]
               .createInstance(Ci.nsIDOMParser)
               .parseFromString(html, "text/html");

// Make sure all links are absolute
for (var i = 0; i < doc.links.length; i++)
    doc.links[i].setAttribute("href", doc.links[i].href);

// Make sure all stylesheets are inlined
var stylesheets = doc.getElementsByTagName("link");
for (i = 0; i < stylesheets.length; i++)
{
    try
    {
        var request = new XMLHttpRequest();
        request.open("GET", stylesheets[i].href, false);
        request.send(null);
        var style = doc.createElement("style");
        style.setAttribute("type", "text/css");
        style.textContent = request.responseText;
        stylesheets[i].parentNode.replaceChild(style, stylesheets[i]);
        i--;
    }
    catch (e)
    {
        // Ignore download errors
    }
}

// Serialize the document into a string again
html = Cc["@mozilla.org/xmlextras/xmlserializer;1"]
         .createInstance(Ci.nsIDOMSerializer)
         .serializeToString(doc.documentElement);

// Now sanizite the HTML code
var parser = Cc["@mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils);
var sanitizedHTML = parser.sanitize(html, parser.SanitizerAllowStyle);

Note that I used a synchronous XMLHttpRequest to download stylesheet contents - this has been done for simplicity, your final code should use asynchronous downloads (most likely via request module) that will not hang the user interface.

And the first parameter in that is said to be a document element. I have no idea what that is suppose to be?

You don't need that. Just use nsIParserUtils.sanitize method, that just get as input a string and returns as output the sanitized version:

var parser = Cc["@mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils);
var sanitizedHTML = parser.sanitize(html, flags);

Check on the link above the section "Constants" to see which flags you need to have in your scenario.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM