简体   繁体   中英

Disable html escaping inside backticks in python-markdown

I observed, that python-markdown always escapes HTML entities inside backticks, even with safe=False:

In [1]: import markdown

In [2]: markdown.markdown("&")
Out[2]: u'<p>&amp;</p>'

In [3]: markdown.markdown("*&amp;*")
Out[3]: u'<p><em>&amp;</em></p>'

In [4]: markdown.markdown("`&amp;`")
Out[4]: u'<p><code>&amp;amp;</code></p>'

Is it a bug or a feature; is there a way to keep HTML entities unchanged?

Backticks designate a code block , meaning that HTML entities must be escaped so that the code displays correctly, so this isn't a bug. While I don't know why you would want to get around that, and there may be better ways to accomplish your goals, python-markdown ignores text inside HTML tags, so perhaps enclosing your HTML entities inside do-nothing HTML would suit your purposes.

>>> import markdown
>>> markdown.markdown("<div>`&amp;`</div>")
u'<div>`&amp;`</div>'

If you find the <div> tags objectionable, you could postprocess them out reasonably simply using a div class and an HTML parsing tool like BeautifulSoup .

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup("<div class='nothing'>`&amp;`</div>")
>>> for div in soup.findAll('div', 'nothing'):
...     div.replaceWithChildren()
>>> print soup
`&amp;`

Maybe a bit more complicated than what you initially wanted, but I think this is probably the simplest solution short of fundamentally modifying python-markdown .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM