简体   繁体   中英

How to process Unicode CSV files in Ruby CSV correctly?

I am trying to process some Google Adwords csv files. The files are available in UNICODE format. When I use Ruby CSV parser to parse the file. I am not able to read the file. The characters display as \\x00a \\x00b etc.

I ended up having to open the file in OpenOffice and choose UTF-8 to render the file and then save it. After that, Ruby CSV can process the file. I also have to remove the first character in the csv file that looks like number 8 in black circle because it is not a valid UTF-8 character. This special character was the result of UNICODE to UTF-8 conversion in OpenOffice.

So what is the best way to convert the csv file to a Ruby friendly encoding without illegal characters?

To see what I can mean, you can try open Ruby CSV to open this file and parse the lines.

https://github.com/zben/encoding_test/blob/master/encoding_test.csv

该页面建议使用Iconv.iconv进行转换:

doc = Iconv.iconv('UTF-8', 'UTF-16', doc)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM