简体   繁体   中英

How to prevent bleach from escaping > (blockquote) tag used in Markdown

I'm using bleach to sanitize user input. But I use Markdown which means I need the blockquote > symbol to go through without being escaped as & gt; so I can pass it to misaka for rendering.

The documentation says by default it escapes html markup but doesn't say how to turn that off for the > symbol. I would still like it to escape actual html tags.

http://bleach.readthedocs.org/en/latest/clean.html

Any other ideas for sanitizing input while maintaing the ability to use Markdown would be appreciated.

Bleach is a HTML sanitizer, not a Markdown sanitizer. As explained here , you should run your user input through Markdown first, then through Bleach. Like this:

sanitized_html = bleach.clean(markdown.markdown(some_text))

For more info, see my previously referenced comment .

Do you need strip all tags, but leave > as it is?

  1. strip all tags, get output
  2. html decode output of step 1, and pass that data to misaka

Simple way for step 2:

output.replace('>', '>')

More professional

import HTMLParser
h = HTMLParser.HTMLParser()
s = h.unescape(sanitized user input)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM