Django security. dealing with user input . Is html.strip_tags enough or should I use bleach?

Question

I'm accepting user input on a small forum I have. This is what I do with user's input:

First, call "html.strip_tags" from django.utils.html on user's cleaned_data[input].
Save it to the database. Postgre.
Query the text and use a regex to replace \\n with br and display spaces entered by users.
Then, I do {{text|safe}} to display the text (if I don't mark it as safe, it won't display spaces between paragraphs but br tags).
Finally I use some jquery plugins on the text: Autolinker.js to detect and "urlize" hyperlinks and trunk8 to control its length.

So, because I do {{text|safe}} I am worried about malicious input, is html.strip_tags enough?

The documentation about strip_tags writes:

"Tries to remove anything that looks like an HTML tag from the string, that is anything contained within <>. Absolutely NO guaranty is provided about the resulting string being entirely HTML safe. So NEVER mark safe the result of a strip_tag call without escaping it first, for example with escape()."

The documentation about Python's Bleach:

"The primary goal of Bleach is to sanitize user input that is allowed to contain some HTML as markup and is to be included in the content of a larger page."

Because the user input is not allowed to contain any html, my guess is that Bleach is not needed.. but I am kind of noob so your suggestions will be appreciated.

Answer 1

Quoting the docs on striptags

No safety guarantee

Note that striptags doesn't give any guarantee about its output being entirely HTML safe, particularly with non valid HTML input. So NEVER apply the safe filter to a striptags output. If you are looking for something more robust, you can use the bleach Python library, notably its clean method.

I think the answer here is to use bleach to strip the tags, easy as bleach.clean(text,tags=[]) . Plus, with bleach linkefy you can take care of the url's as well.

Regarding your general process, If the string is generated once and queried multiple times ... why aren't you adding the line break and url's while saving ?

Answer 2

If the only reason you need to mark the input as "safe" is so that it will display your   tags that you inserted where users typed line breaks, then your best approach is to use the linebreaks filter. From the Django documentation :

linebreaks

Replaces line breaks in plain text with appropriate HTML; a single newline becomes an HTML line break (   ) and a new line followed by a blank line becomes a paragraph break (  ).

For example:
 {{ value|linebreaks }} 
If value is Joel\\nis a slug, the output will be Joel is a slug.

Instead of using a regex to replace newlines with   s in your database, just leave the data in there as the user entered it. Then, you can display it in a template with

{{ text|striptags|linebreaks }}

This will first remove (most) HTML tags from your user's input, then add in   and  tags for newlines. It does not mark the string as safe, though, so any tags left in the user's input will be escaped; only the tags created by linebreaks will have any effect.

(Note that if you don't want  tags, you can use the variant filter linebreaksbr ).

Django security. dealing with user input . Is html.strip_tags enough or should I use bleach?

Question

2 answers

solution1
2 2014-07-23 05:59:04

solution2
0 2014-12-03 23:23:03

Django security. dealing with user input . Is html.strip_tags enough or should I use bleach?

Question

2 answers

solution1 2 2014-07-23 05:59:04

solution2 0 2014-12-03 23:23:03

solution1
2 2014-07-23 05:59:04

solution2
0 2014-12-03 23:23:03