简体   繁体   中英

Read Csv file encoding error

I am using the following method for reading Csv file content:

    /// <summary>
    /// Reads data from a CSV file to a datatable
    /// </summary>
    /// <param name="filePath">Path to the CSV file</param>
    /// <returns>Datatable filled with data read from the CSV file</returns>
    public DataTable ReadCsv(string filePath)
    {
        if (string.IsNullOrEmpty(filePath))
        {
            log.Error("Invalid CSV file name.");
            return null;
        }

        try
        {
            DataTable dt = new DataTable();

            string folder = FileMngr.Instance.ExtractFileDir(filePath);
            string fileName = FileMngr.Instance.ExtractFileName(filePath);
            string connectionString = 
            string.Concat(@"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=",
            folder, ";");

            using (OdbcConnection conn = 
                   new System.Data.Odbc.OdbcConnection(connectionString))
            {
                string selectCommand = string.Concat("select * from [", fileName, "]");
                using (OdbcDataAdapter da = new OdbcDataAdapter(selectCommand, conn))
                {
                    da.Fill(dt);
                }
            }

            return dt;
        }
        catch (Exception ex)
        {
            log.Error("Error loading CSV content", ex);
            return null;
        }
    }

This method works if I have a UTF-8 encoded Csv file with a schema.ini that looks something like this:

[Example.csv]
Format=Delimited(,)
ColNameHeader=True
MaxScanRows=2
CharacterSet=ANSI

If I have German characters in a Csv file with Unicode encoding, the method cannot read the data correctly.

What modifications can I make to the above method to read Unicode Csv files? If there is no way to do it this way, what Csv-reading code can you suggest?

Try using CharacterSet=UNICODE in your schema.ini file. Although this is not documented on MSDN it works according to this thread on Microsoft Forums .

Well, a very good and well-used streaming CSV reader is on CodeProject ; that is the first thing I'd try... but it sounds like your encoding may be borked, which might not make it simple... of course, it could just be odbc that is breaking, in which case the above might work fine.

For simple CSV you could try parsing it yourself ( string.Split etc), but there are enough edge-cases that a pre-rolled parser is worth using.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM