简体   繁体   中英

how to encode/decode escape sequence characters in python

how to encode/decode escape sequence character '\\x13' in python into a character that is valid in a RSS or XML.

use case is, I am getting data from arbitrary sources and making a RSS feed for that data. The data source sometimes have escape sequence character which is breaking my RSS feed.

So how can I sanitize the input data with escape sequence character.

\\x13 (ASCII 19, 'DC3') can't be escaped; it is invalid in XML 1.0, period. You can include one, encoded as &#19; or &#x13; in XML 1.1, but then you have to include the <?xml version="1.1"?> declaration and many tools won't like it.

I've no idea why that character would be included in your data, but the way forward is probably to completely remove control codes. For example:

re.sub('[\x00-\x08\x0B-\x1F]', '', s)

For some kinds of escape sequence (eg. ANSI colour codes) you might get stray (non-control) characters still in there, in which case you'd probably want a custom parser for that particular format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM