简体   繁体   中英

How to replace a double backslash with a single backslash in python?

I have a string. In that string are double backslashes. I want to replace the double backslashes with single backslashes, so that unicode char codes can be parsed correctly.

(Pdb) p fetched_page
'<p style="text-align:center;" align="center"><strong><span style="font-family:\'Times New Roman\', serif;font-size:115%;">Chapter 0<\\/span><\\/strong><\\/p>\n<p><span style="font-family:\'Times New Roman\', serif;font-size:115%;">Chapter 0 in \\u201cDreaming in Code\\u201d give a brief description of programming in its early years and how and why programmers are still struggling today...'

Inside of this string, you can see escaped unicode character codes, such as:

\\u201c

I want to turn this into:

\u201c

Attempt 1:

fetched_page.replace('\\\\', '\\')

but this doesn't work -- it searches for quadruple backslashes.

Attempt 2:

fetched_page.replace('\\', '\')

But this results in an end of line error.

Attempt 3:

fetched_page.decode('string_escape')

But this had no effect on the text. All the double backslashes remained as double backslashes.

您可以尝试codecs.escape_decode ,这应该解码转义序列。

Python3:

>>> b'\\u201c'.decode('unicode_escape')
'“'

or

>>> '\\u201c'.encode().decode('unicode_escape')
'“'

I'm not getting the behaviour you describe:

>>> x = "\\\\\\\\"
>>> print x
\\\\
>>> y = x.replace('\\\\', '\\')
>>> print y
\\

When you see '\\\\\\\\' in your output, you're seeing twice as many slashes as there are in the string because each on is escaped. The code you wrote should work fine. Trying print ing out the actual values, instead of only looking at how the REPL displays them.

为了扩展 Jeremy 的回答,您的问题是'\\'是非法字符串,因为\\'转义引号,因此您的字符串永远不会终止。

It may be slightly overkill, but...

>>> import re
>>> a = '\\u201c\\u3012'
>>> re.sub(r'\\u[0-9a-fA-F]{4}', lambda x:eval('"' + x.group() + '"'), a)
'“〒'

So yeah, the simplest solution would ms4py's answer, calling codecs.escape_decode on the string and taking the result (or the first element of the result if escape_decode returns a tuple as it seems to in Python 3). In Python 3 you'd want to use codecs.unicode_escape_decode when working with strings (as opposed to bytes objects), though.

Interesting question, but in reality, you have only one slash symbol. It's just a way how it represents in python. If you make a list of symbols which string contains? like:

[s for s in string_object]

it shows every symbol and represents "" as "\\", but you don't have to be confused about it. It is the single symbol actually. So, in the case of my example, it's just not a double backslash.

real example:

>>> [s for s in 'usnDu\\NgAnA{I']
['u', 's', 'n', 'D', 'u', '\\', 'N', 'g', 'A', 'n', 'A', '{', 'I']

Just print it:

>>> a = '\\u201c'
>>> print a
\u201c

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM