简体   繁体   English

在Ruby中读取包含特殊字符的CSV文件并存储到SQL Server中

[英]Read a CSV file with special characters in Ruby and store into SQL Server

I'm trying to import a CSV file (UTF-8 encoding) in Ruby (2.0.0) in to my database (MSSQL 2008R2, COLLATION French_CI_AS ), but the special characters (French accents on vowels) are not stored properly : éèçôü becomes éèçôü (or other similar jibberish). 我正在尝试将Ruby(2.0.0)中的CSV文件(UTF-8编码)导入到我的数据库(MSSQL 2008R2, COLLATION French_CI_AS )中,但特殊字符(元音上的法语重音符号)未正确存储: éèçôü成为éèçôü (或其他类似的乱码)。

I use this piece of code to read the file : 我用这段代码来读取文件:

CSV.foreach(file, col_sep: ';', encoding: "utf-8") do |row|
   # ...
end

I tried various encoding in the CSV options ( utf-8 , iso-8859-1 , windows-1252 ), but none would store the special characters correctly. 我在CSV选项( utf-8iso-8859-1windows-1252 )中尝试了各种编码,但没有一个能正确存储特殊字符。

Before you ask, my database collation supports those characters, since we have successfully imported data containing those using PHP importers. 在您提出要求之前,我的数据库排序规则支持这些字符,因为我们已成功导入包含使用PHP导入程序的数据。 If I dump the data using puts or a file logger, everything is correct. 如果我使用puts或文件记录器转储数据,一切都是正确的。

Is something wrong with my code, or do I need to specify something else (like the ruby class file encoding for example) ? 我的代码有问题,还是我需要指定别的东西(例如ruby类文件编码)?

Thanks 谢谢

EDIT : The data saving is done by a PHP REST API that works fine with accented characters. 编辑:数据保存由PHP REST API完成,可以使用重音字符。 It stores data as it is received. 它在收到数据时存储数据。

In Ruby, I parse my data, store it in an object and then send the JSON-encoded object in the body of my PUT request. 在Ruby中,我解析我的数据,将其存储在一个对象中,然后在我的PUT请求的主体中发送JSON编码的对象。 But if I use an SQL query directly from Ruby, the problem remains : 但是,如果我直接从Ruby使用SQL查询,问题仍然存在:

query = <<-SQL
    UPDATE MyTable SET MyTable_title = '#{row_data['title']}' WHERE MyTable_id = '#{row_data['id']}'
SQL
res = db.execute query

I was thinking that this had something to do with the encoding type on your CSV file, so started digging around on that. 我认为这与CSV文件中的编码类型有关,所以开始挖掘它。 I did find that windows-1252 encoding will insert control characters. 我确实发现windows-1252编码会插入控制字符。

You can read more about it here: Converting special charactes such as ü and à back to their original, latin alphbet counterparts in C# 你可以在这里阅读更多相关内容: 将特殊字符(如Ã和Ã)转换回原来的拉丁语alphbet对应的C#

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM