简体   繁体   English

使用decode()与regex来解除此字符串的转换

[英]Using decode() vs. regex to unescape this string

I have the following string and I'm trying to figure out the best practice for unescaping it. 我有以下字符串,我正在试图找出解决它的最佳做法。

The solution has to be somewhat flexible in that I'm receiving this input from an API and I can't be absolutely certain that the current character structure ( \\n as opposed to \\r ) will always be the same. 解决方案必须有点灵活,因为我从API接收此输入并且我不能完全确定当前字符结构( \\n而不是\\r )将始终相同。

'"If it ain\\'t broke, don\\'t fix it." \\nWent in for a detailed car wash.\\nThe attendants raved-up my engine when taking the car into the tunnel. NOTE: my car is...'

This regex seems like it should work: 这个正则表达式似乎应该工作:

text_excerpt = re.sub(r'[\s"\\]', ' ', raw_text_excerpt).strip()

I've aso read that decode() might work (and would be a better solution generally). 我已经读过decode()可能会起作用(并且通常会是一个更好的解决方案)。

raw_text_excerpt.decode('string_unescape')

Tried something along those lines and it didn't work. 尝试了这些方面的东西,它没有奏效。 Any suggestions? 有什么建议? Is regex best here? 正则表达式在这里最好吗?

The codec you're looking for is string-escape : 您正在寻找的编解码器是string-escape

>>> print "\\'".decode("string-escape")
'

I'm not sure what version they added it in, though... could be an older version you're using that doesn't have it. 我不确定他们添加了什么版本,但是...可能是你正在使用的旧版本没有它。 I'm running: 我在跑:

Python 2.6.6 (r266:84292, Mar 25 2011, 19:36:32) 
[GCC 4.5.2] on linux2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM