I have a string (obtained from a HTML web page request) that has special characters in it:
'Dimarts, 10 Mar\\xe7 2020'
If I print this string, it correctly escapes de double backslash and prints only one:
Dimarts, 10 Mar\xe7 2020
But what I would like is to print the real character, which is a character 92 = ç
Dimarts, 10 Març 2020
I've tried replacing the double backslash with a single one, or even unescaping with the html library, with no luck. If I manually set a new variable with the text, and then print it, it works:
print('Original: ', repr(text))
print('Direct : ', text)
print('Option 1: ', text.replace('\\\\', '\\'))
print('Option 2: ', text.replace(r'\\', '\\'))
print('Option 3: ', text.replace(r'\\', chr(92)))
print('Option 4: ', text.replace('\\', chr(92)))
print('Option 5: ', html.unescape(text))
text = 'Dimarts, 10 Mar\xe7 2020'
print('Manual: ', text)
And the result is never as expected:
Original: 'Dimarts, 10 Mar\\xe7 2020'
Direct : Dimarts, 10 Mar\xe7 2020
Option 1: Dimarts, 10 Mar\xe7 2020
Option 2: Dimarts, 10 Mar\xe7 2020
Option 3: Dimarts, 10 Mar\xe7 2020
Option 4: Dimarts, 10 Mar\xe7 2020
Option 5: Dimarts, 10 Mar\xe7 2020
Manual: Dimarts, 10 Març 2020
Is there any way to tell Python to correctly process the special characters?
Not sure if this is what you want but:
print(chr(231))
Will print the character you want.
It will also be printed by:
print(u"\xe7")
Well, it turns out I had problems with the codification of files in Windows. I had to decode it before processing. So, doing this fixed the problem:
htmlfile = urllib.request.urlopen('http://www.somewebpage.com/')
for line in htmlfile:
line = line.decode('cp1252')
It is also possible to decode the entire html:
htmlfile = urllib.request.urlopen('http://www.somewebpage.com/').read()
htmldecoded = htmlfile.decode('cp1252')
Doing so fixed the problem and I could print the strings correctly.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.