Ruby，Nokogiri：我如何在整个nokogiri解析，erb模板和编码HTML文件中确保UTF8

Question

I finally managed to parse parts of a website: 我终于设法解析了网站的一部分：

get '/' do
  url = '<website>'
  data = Nokogiri::HTML(open(url))
  @rows = data.css("td[valign=top] table tr") 
  erb :muster
end

Now I am trying to extract a certain line in my view. 现在我想在我的视图中提取某一行。 Therefore i put in my HTML code: 因此我输入了我的HTML代码：

<%= @rows[2] %>

And it actually returns the code, but it has problems with UTF8: 它实际上返回代码，但它有UTF8的问题：

<td class="class_name">&nbsp;</td>

instead it says 相反它说

<td class="class_name">�</td>

How do I ensure UTF8 during nokogiri parsing, erb, and HTML generation? 如何在nokogiri解析，erb和HTML生成期间确保UTF8？

Answer 1

It looks like in your case, the document is declaring that it's encoded using iso8859: 看起来在您的情况下，文档声明它是使用iso8859编码的：

<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">

You can do the following to force Nokogiri to treat the stream as UTF-8: 您可以执行以下操作以强制Nokogiri将流视为UTF-8：

data = Nokogiri::HTML(open(url), nil, Encoding::UTF_8.to_s)