简体   繁体   English

Ruby windows-1250编码

[英]Ruby windows-1250 encoding

I'm trying to get data from site with charset windows-1250 I have this code: 我试图从charset windows-1250网站获取数据我有这个代码:

require 'open-uri'
p open('http://www.ceskybenzin.cz/mapa/0').read.force_encoding('Windows-1250').encode('UTF-8').scan /addMarker\( point, '(.*?) - (.*?) - (.*?) - (.*?)', 'green', (.*?), bublina, 0 \);/

and I'm getting data like: 我得到的数据如下:

["EuroOil", "Prun\u00E9\u0159ov ", "U\u0161\u00E1k", "Zat\u00EDm nezadan\u00FD kraj", "181"]

could someone tell me how to correctly get data from windows-1250 site 有人可以告诉我如何从Windows-1250网站正确获取数据

Thank you 谢谢

you have unicode-8 symbols in your data not win-1250. 你的数据中有unicode-8符号而不是win-1250。

to convert your current example string to correct text you can do this 要将当前示例字符串转换为正确的文本,您可以执行此操作

data = ["EuroOil", "Prun\u00E9\u0159ov ", "U\u0161\u00E1k", "Zat\u00EDm nezadan\u00FD kraj", "181"]
data.select{|snippet| snippet.encode("UTF-8")}

=> ["EuroOil", "Prunéřov ", "Ušák", "Zatím nezadaný kraj", "181"] => [“EuroOil”,“Prunéřov”,“Ušák”,“Zatímnezadanýkraj”,“181”]

if output you exampled is from console, then this is because console outputs with utf-8 encoding not with encoding of your source site (and maybe parsing works correctly until it displays) 如果输出你的exampled是来自控制台,那么这是因为控制台输出的utf-8编码不是你的源站点的编码(并且可能解析工作正常,直到它显示)

a[0] => ["Kont.cz (NOVA-KONT)", "Praha 4", "Opatovsk\xC3\xA1", "Hlavn\u00ED m\u011Bsto Praha", "1"]
a.last => ["EuroOil", "Prun\u00E9\u0159ov ", "U\u0161\u00E1k", "Zat\u00EDm nezadan\u00FD kraj", "181"]

a.last.select { |i| puts i.encode("utf-8") } => produces

EuroOil
Prunérov
Usák
Zatím nezadaný kraj
181

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM