如何防止在Markdown中使用的漂白>（blockquote）标签

Question

I'm using bleach to sanitize user input. 我正在使用漂白剂来消毒用户输入。 But I use Markdown which means I need the blockquote > symbol to go through without being escaped as & gt; 但我使用Markdown这意味着我需要使用blockquote>符号而不会被转义为＆gt; so I can pass it to misaka for rendering. 所以我可以将它传递给misaka进行渲染。

The documentation says by default it escapes html markup but doesn't say how to turn that off for the > symbol. 默认情况下，文档说它会转义html标记，但没有说明如何关闭>符号。 I would still like it to escape actual html tags. 我仍然希望它能够逃避实际的html标签。

http://bleach.readthedocs.org/en/latest/clean.html http://bleach.readthedocs.org/en/latest/clean.html

Any other ideas for sanitizing input while maintaing the ability to use Markdown would be appreciated. 在维护使用Markdown的能力的同时保护输入的任何其他想法将不胜感激。

Answer 1

Bleach is a HTML sanitizer, not a Markdown sanitizer. Bleach是一种HTML消毒剂，而不是Markdown消毒剂。 As explained here , you should run your user input through Markdown first, then through Bleach. 正如解释在这里，你应该首先运行通过降价的用户输入，然后通过漂白。 Like this: 像这样：

sanitized_html = bleach.clean(markdown.markdown(some_text))

For more info, see my previously referenced comment . 有关详细信息，请参阅我之前引用的评论。

Answer 2

Do you need strip all tags, but leave > as it is? 你需要剥离所有标签，但保持>原样吗？

strip all tags, get output 剥离所有标签，获得输出
html decode output of step 1, and pass that data to misaka html解码步骤1的输出，并将该数据传递给misaka

Simple way for step 2: 第2步的简单方法：

output.replace('>', '>') output.replace（'＆gt;'，'>'）

More professional 更专业

import HTMLParser
h = HTMLParser.HTMLParser()
s = h.unescape(sanitized user input)

如何防止在Markdown中使用的漂白>（blockquote）标签

问题描述

2 个解决方案

解决方案1
2 2014-02-21 17:12:56

解决方案2
0 已采纳 2014-02-21 08:00:11

如何防止在Markdown中使用的漂白&gt;（blockquote）标签

问题描述

2 个解决方案

解决方案1 2 2014-02-21 17:12:56

解决方案2 0 已采纳 2014-02-21 08:00:11

如何防止在Markdown中使用的漂白>（blockquote）标签

解决方案1
2 2014-02-21 17:12:56

解决方案2
0 已采纳 2014-02-21 08:00:11