[英]Ruby, Nokogiri: how do i ensure UTF8 throughout nokogiri parsing, erb template, and encoding HTML file
I finally managed to parse parts of a website: 我终于设法解析了网站的一部分:
get '/' do
url = '<website>'
data = Nokogiri::HTML(open(url))
@rows = data.css("td[valign=top] table tr")
erb :muster
end
Now I am trying to extract a certain line in my view. 现在我想在我的视图中提取某一行。 Therefore i put in my HTML code: 因此我输入了我的HTML代码:
<%= @rows[2] %>
And it actually returns the code, but it has problems with UTF8: 它实际上返回代码,但它有UTF8的问题:
<td class="class_name"> </td>
instead it says 相反它说
<td class="class_name">�</td>
How do I ensure UTF8 during nokogiri parsing, erb, and HTML generation? 如何在nokogiri解析,erb和HTML生成期间确保UTF8?
See: http://www.nokogiri.org/tutorials/parsing_an_html_xml_document.html#encoding 请参阅: http : //www.nokogiri.org/tutorials/parsing_an_html_xml_document.html#encoding
It looks like in your case, the document is declaring that it's encoded using iso8859: 看起来在您的情况下,文档声明它是使用iso8859编码的:
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
You can do the following to force Nokogiri to treat the stream as UTF-8: 您可以执行以下操作以强制Nokogiri将流视为UTF-8:
data = Nokogiri::HTML(open(url), nil, Encoding::UTF_8.to_s)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.