Python-將字符串更改為utf8

Question

我正在嘗試將葡萄牙語寫入HTML文件，但是卻遇到了一些有趣的字符。 我該如何解決？

first = """<p style="color: red; font-family: 'Liberation Sans',sans-serif">{}</p>""".format(sentences1[i]) 
f.write(first)

預期的輸出：Hoje，nos uns unimos ao povo ...

瀏覽器中的實際輸出（Ubuntu上的Firefox）：Hoje，nnos nos unimos ao povo ...

我嘗試這樣做：

first = """<p style="color: red; font-family: 'Liberation Sans',sans-serif">{}</p>""".format(sentences1[i]) 
f.write(first.encode('utf8'))

終端中的輸出：UnicodeDecodeError：'ascii'編解碼器無法解碼位置65的字節0xef：序數不在范圍內（128）

為什么會出現此錯誤，又如何在沒有有趣字符的情況下將其他語言寫到HTML文檔？
或者，是否可以使用上述字體格式寫入其他文件類型？

Answer 1

您的格式字符串也應該是Unicode字符串：

first = u"""<p style="color: red; font-family: 'Liberation Sans',sans-serif">{}</p>""".format(sentences1[i]) 
f.write(first)

Answer 2

每個軟件開發人員絕對，肯定必須絕對了解Unicode和字符集（無借口！）

^閱讀！

當您嘗試對從具有特殊字符的文件中讀取的文本使用.format時，就會發生這種情況。

>>> mystrf = u'special text here >> {} << special text'
>>> g = open('u.txt','r')
>>> lines = g.readlines()
>>> mystrf.format(lines[0])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>>

Python嘗試將文件中的文本解碼為ASCII。 那么我們該如何解決。

我們只是簡單地告訴python正確的編碼。

>>> line = mystrf.format(lines[0].decode('utf-8'))
>>> print line
special text here >> ß << special text

但是，當我們嘗試再次寫入文件時。 沒用

>>> towrite.write(line)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in position 21: ordinal not in range(128)

在再次寫入文件之前，我們對行進行編碼。

>>> towrite.write(line.encode('utf-8'))

Answer 3

看來您正在使用已經UTF-8編碼的字符串，所以可以。 問題在於HTML輸出中的meta標簽正在將文本標識為UTF-8以外的其他內容。 例如，您可能具有<meta charset="ISO-8859-1"> ; 您需要將其更改為<meta charset="UTF-8"> 。

這種字符集混淆的術語是Mojibake 。

PS您的字符串以字節順序標記（BOM）開頭，您可能需要先刪除它，然后再使用該字符串。

Python-將字符串更改為utf8

問題描述

3 個解決方案

解決方案1
1 2015-03-17 13:55:53

解決方案2
0 2015-03-17 15:46:10

解決方案3
0 2015-03-17 15:56:01

Python-將字符串更改為utf8

問題描述

3 個解決方案

解決方案1 1 2015-03-17 13:55:53

解決方案2 0 2015-03-17 15:46:10

解決方案3 0 2015-03-17 15:56:01

解決方案1
1 2015-03-17 13:55:53

解決方案2
0 2015-03-17 15:46:10

解決方案3
0 2015-03-17 15:56:01