I'm workin with python. But now, I need to fix Go bug. I have string like:
<!-- \\xd0\\xbf\\xd0\\xbb\\xd0\\xb0\\xd1\\x82\\xd0\\xb5\\xd0\\xb6\\xd0\\xb5\\xd0\\xb9-->\\n \\n \\n <guarantees>\\n
How do make it correct and readable? If it were a Python, I would use decode('unicode-escape')
. But what should I use in Go?
Update
I've edited description. There are double backslashes
Update 1
I followed the advice from the answer https://stackoverflow.com/a/67172057/11029221 , and repaired the part of the code that is doing the encoding in such a wrong way. But I found out that in GO you can fix such text like this:
a := `\\xd0\\xb5\\xd0\\xb6\\xd0\\xb5\\xd0\\xb9-->\\n\\n\\n<guarantees>\\n`
a = strconv.Quote(a)
a = strings.ReplaceAll(a, `\\\\`, `\`)
unquoted, err := strconv.Unquote(a)
if err != nil {
println(err)
}
str := []byte(unquoted)
for len(str) > 0 {
r, size := utf8.DecodeLastRune(str)
out = string(r) + out
str = str[:len(str)-size]
}
fmt.Printf("%s", out)
I'm not sure what @melpomene's criteria for "knowing what they are doing" are, but the following solution has worked previously, for example for decoding broken Hebrew text:
("\\u00c3\\u00a4"
.encode('latin-1')
.decode('unicode_escape')
.encode('latin-1')
.decode('utf-8')
)
outputs
'ä'
This works as follows:
The string that contains only ascii-characters '\', 'u', '0', '0', 'c', etc. is converted to bytes using some not-too-crazy 8-bit encoding (doesn't really matter which one, as long as it treats ASCII characters properly)
Use a decoder that interprets the '\u00c3' escapes as unicode code point U+00C3 (LATIN CAPITAL LETTER A WITH TILDE, 'Ã'). From the point of view of your code, it's nonsense, but this unicode code point has the right byte representation when again encoded with ISO-8859-1/'latin-1', so...
encode it again with 'latin-1'
Decode it "properly" this time, as UTF-8
Again, same remark as in the linked post: before investing too much energy trying to repair the broken text, you might want to try to repair the part of the code that is doing the encoding in such a strange way.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.