[英]How to export DataFrame to Html with utf-8 encoding?
I keep getting: 我不断得到:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 265-266: ordinal not in range(128)
when I try: 当我尝试:
df.to_html("mypage.html")
here is a sample of how to reproduce the problem: 以下是如何重现该问题的示例:
df = pd.DataFrame({"a": [u'Rue du Gu\xc3\xa9, 78120 Sonchamp'], "b": [u"some other thing"]})
df.to_html("mypage.html")
the list of elements in "a"
are of type "unicode"
. "a"
中的元素列表类型为"unicode"
。
when I want to export it to csv it works because you can do: 当我想将其导出到csv时,它可以工作,因为您可以执行以下操作:
df.to_csv("myfile.csv", encoding="utf-8")
Your problem is in other code. 您的问题出在其他代码中。 Your sample code has a Unicode string that has been mis-decoded as latin1
, Windows-1252
, or similar, since it has UTF-8 sequences in it. 您的示例代码包含一个Unicode字符串,由于其中包含UTF-8序列,因此被误解码为latin1
, Windows-1252
或类似名称。 Here I undo the bad decoding and redecode as UTF-8, but you'll want to find where the wrong decode is being performed: 在这里,我撤消了错误的解码,并将其重新编码为UTF-8,但是您将要查找执行错误解码的位置:
>>> s = u'Rue du Gu\xc3\xa9, 78120 Sonchamp'
>>> s.encode('latin1').decode('utf8')
u'Rue du Gu\xe9, 78120 Sonchamp'
>>> print(s.encode('latin1').decode('utf8'))
Rue du Gué, 78120 Sonchamp
The issue is actually in using df.to_html("mypage.html")
to save the HTML to a file directly. 问题实际上出在使用df.to_html("mypage.html")
直接将HTML保存到文件中。 If instead you write the file yourself, you can avoid this encoding bug with pandas. 相反,如果您自己编写文件,则可以避免使用熊猫编码错误。
html = df.to_html()
with open("mypage.html", "w", encoding="utf-8") as file:
file.write(html)
You may also need to specify the character set in the head of the HTML for it to show up properly on certain browsers (HTML5 has UTF-8 as default): 您可能还需要在HTML的开头指定字符集,以使其在某些浏览器中正确显示(HTML5默认为UTF-8):
<meta charset="UTF-8">
This was the only method that worked for me out of the several I've seen. 这是我所见过的唯一对我有用的方法。
The way it worked for me: 它为我工作的方式:
html = df.to_html()
with open("dataframe.html", "w", encoding="utf-8") as file:
file.writelines('<meta charset="UTF-8">\n')
file.write(html)
If you really need to keep the output to html, you could try cleaning the code in a numpy array before writing to_html. 如果确实需要将输出保留为html,则可以在写入to_html之前尝试清除numpy数组中的代码。
df = pd.DataFrame({"a": [u'Rue du Gu\xc3\xa9, 78120 Sonchamp'], "b": [u"some other thing"]})
def clean_unicode(df):
*#Transforms the DataFrame to Numpy array*
df=df.as_matrix()
*#Encode all strings with special characters*
for x in np.nditer(df, flags=['refs_ok'], op_flags =['copy', 'readonly']):
df[df==x]=str(str(x).encode("latin-1", "replace").decode('utf8'))
*#Transform the Numpy array to Dataframe again*
df=pd.DataFrame(df)
return df
df=clean_unicode(df)
df.to_html("Results.html') -----> Success!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.