简体   繁体   中英

How to print a string with special characters with double backslash (like \\xe7) in Python

I have a string (obtained from a HTML web page request) that has special characters in it:

'Dimarts, 10 Mar\\xe7 2020'

If I print this string, it correctly escapes de double backslash and prints only one:

Dimarts, 10 Mar\xe7 2020

But what I would like is to print the real character, which is a character 92 = ç

Dimarts, 10 Març 2020

I've tried replacing the double backslash with a single one, or even unescaping with the html library, with no luck. If I manually set a new variable with the text, and then print it, it works:

print('Original: ', repr(text))
print('Direct  : ', text)
print('Option 1: ', text.replace('\\\\', '\\'))
print('Option 2: ', text.replace(r'\\', '\\'))
print('Option 3: ', text.replace(r'\\', chr(92)))
print('Option 4: ', text.replace('\\', chr(92)))
print('Option 5: ', html.unescape(text))
text = 'Dimarts, 10 Mar\xe7 2020'
print('Manual:   ', text)

And the result is never as expected:

Original:  'Dimarts, 10 Mar\\xe7 2020'
Direct  :  Dimarts, 10 Mar\xe7 2020
Option 1:  Dimarts, 10 Mar\xe7 2020
Option 2:  Dimarts, 10 Mar\xe7 2020
Option 3:  Dimarts, 10 Mar\xe7 2020
Option 4:  Dimarts, 10 Mar\xe7 2020
Option 5:  Dimarts, 10 Mar\xe7 2020
Manual:    Dimarts, 10 Març 2020

Is there any way to tell Python to correctly process the special characters?

Not sure if this is what you want but:

print(chr(231))

Will print the character you want.

It will also be printed by:

print(u"\xe7")

Well, it turns out I had problems with the codification of files in Windows. I had to decode it before processing. So, doing this fixed the problem:

htmlfile = urllib.request.urlopen('http://www.somewebpage.com/')
for line in htmlfile:
    line = line.decode('cp1252')

It is also possible to decode the entire html:

htmlfile = urllib.request.urlopen('http://www.somewebpage.com/').read()
htmldecoded = htmlfile.decode('cp1252')

Doing so fixed the problem and I could print the strings correctly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM