简体   繁体   English

读取Csv文件编码错误

[英]Read Csv file encoding error

I am using the following method for reading Csv file content: 我使用以下方法来读取Csv文件内容:

    /// <summary>
    /// Reads data from a CSV file to a datatable
    /// </summary>
    /// <param name="filePath">Path to the CSV file</param>
    /// <returns>Datatable filled with data read from the CSV file</returns>
    public DataTable ReadCsv(string filePath)
    {
        if (string.IsNullOrEmpty(filePath))
        {
            log.Error("Invalid CSV file name.");
            return null;
        }

        try
        {
            DataTable dt = new DataTable();

            string folder = FileMngr.Instance.ExtractFileDir(filePath);
            string fileName = FileMngr.Instance.ExtractFileName(filePath);
            string connectionString = 
            string.Concat(@"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=",
            folder, ";");

            using (OdbcConnection conn = 
                   new System.Data.Odbc.OdbcConnection(connectionString))
            {
                string selectCommand = string.Concat("select * from [", fileName, "]");
                using (OdbcDataAdapter da = new OdbcDataAdapter(selectCommand, conn))
                {
                    da.Fill(dt);
                }
            }

            return dt;
        }
        catch (Exception ex)
        {
            log.Error("Error loading CSV content", ex);
            return null;
        }
    }

This method works if I have a UTF-8 encoded Csv file with a schema.ini that looks something like this: 如果我有一个UTF-8编码的Csv文件,其schema.ini看起来像这样:

[Example.csv]
Format=Delimited(,)
ColNameHeader=True
MaxScanRows=2
CharacterSet=ANSI

If I have German characters in a Csv file with Unicode encoding, the method cannot read the data correctly. 如果我在具有Unicode编码的Csv文件中有德语字符,则该方法无法正确读取数据。

What modifications can I make to the above method to read Unicode Csv files? 我可以对上述读取Unicode Csv文件的方法进行哪些修改? If there is no way to do it this way, what Csv-reading code can you suggest? 如果没有办法这样做,你能建议什么样的Csv阅读代码?

Try using CharacterSet=UNICODE in your schema.ini file. 尝试在schema.ini文件中使用CharacterSet=UNICODE Although this is not documented on MSDN it works according to this thread on Microsoft Forums . 虽然这在MSDN上没有记录,但它可以根据Microsoft论坛上的这个主题进行操作

Well, a very good and well-used streaming CSV reader is on CodeProject ; 好吧, CodeProject上有一个非常好用的流式CSV读取器; that is the first thing I'd try... but it sounds like your encoding may be borked, which might not make it simple... of course, it could just be odbc that is breaking, in which case the above might work fine. 这是我尝试的第一件事......但听起来你的编码可能会被剔除,这可能不会让它变得简单......当然,它可能只是破坏了,在这种情况下上面可能会有效精细。

For simple CSV you could try parsing it yourself ( string.Split etc), but there are enough edge-cases that a pre-rolled parser is worth using. 对于简单的CSV,您可以尝试自己解析它( string.Split等),但是有足够的边缘情况,预卷解析器值得使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM