简体   繁体   中英

Unicode error while parsing table in python markdown

I am using Python Markdown to parse the following table.

Escape sequences |  Character represented
-----------------|--------------------------
\b      |   Backspace
\t      |   Tab
\f      |   Form feed
\n      |   New line
\r      |   Carriage return
\\      |   Backslash
\'      |   Single quote
\"      |   Double quote
\uNNNN  |   where NNNN is a unicode number, with this escape sequence you can print unicode characters

Here is the code i am using

html = markdown.markdown(str, extensions=['markdown.extensions.tables', 'markdown.extensions.fenced_code',
                                          'markdown.extensions.toc', 'markdown.extensions.wikilinks'])
print(html)

and here is the error

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 1000-1001: truncated \uXXXX escape

The problem here is that your input string contains the backslash symbol which has special meaning . To let it work your input data should look like:

Escape sequences |  Character represented
-----------------|--------------------------
\\b      |   Backspace
\\t      |   Tab
\\f      |   Form feed
\\n      |   New line
\\r      |   Carriage return
\\\\      |   Backslash
\\'      |   Single quote
\\"      |   Double quote
\\uNNNN  |   where NNNN is a unicode number, with this escape sequence you can print unicode characters

ie backslash should be escaped by itself. The dumb way to reach this - probably just make some preprocessing before parsing with markdown:

str.replace('\\', '\\\\')  # yes, here too :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM