简体   繁体   中英

Using Filehelpers, import large CSV file into SQL2014 table

The CSV file is comma delimited, contains embedded delimiters and quotes. Some of the fields have a beginning and ending quote and some do not.

The first record is handled perfectly, but not the second record. As you can see the field appears to have a leading quote, but it is actually embedded. The field5 does not have a trailing quote. The result of the import puts blanks in fields 5 and 6 and pushes the field5 data (in bold) into field7 which, later on in the process, causes a Max Field Length violation.

Is there an attribute setting in Filehelpers that I could use to handle the record containing the field that is in bold letters below so that the record imports each field properly? The CSV file is received from an external source so I cannot control the feed.

AT2M-2471-3,,"1178",AccuTemp,48"" Solid Cutting Board (Must be ordered with AT2A-2630-3 or AT2A-2630-22),,ea,"10.00","0.00000","207.00","93.41","0.00","0.00","0.00","0.00",ATCUT,"","1",each,"Cutting Board, Equipment Mounted",Accutemp,"","False",,85,"",,"0","baab3369-bcad-453e-9867-921e4af1203c","",Accutemp,,"","e0fb1dfb-c00d-dd11-a23a-00304834a8c9","bcd6e7a0-be0d-dd11-a23a-00304834a8c9"

AT2M-2877-1,,"1178",AccuTemp, ""U"" channel for connecting two 29"" A Depth griddles ,,ea,"4.00","0.00000","104.00","46.93","0.00","0.00","0.00","0.00",AT2M,"","1",each,,Accutemp,"","False",,85,"",,"0","f7d56cb1-b2ab-40c7-b7e5-55ee1b4d1023","",Accutemp,,"","e3fb1dfb-c00d-dd11-a23a-00304834a8c9","bcd6e7a0-be0d-dd11-a23a-00304834a8c9"

Here is the SQL table structure, no indexes:

    CREATE TABLE [dbo].[rawdata](
        [Model Number] [varchar](50) NULL,
        [User Stock Model Number] [varchar](50) NULL,
        [Vendor Number] [varchar](50) NULL,
        [Vendor Name] [varchar](50) NULL,
        [Specification] [varchar](max) NULL,
        [Vendor Pack] [varchar](50) NULL,
        [Selling Unit] [varchar](50) NULL,
        [Weight] [varchar](50) NULL,
        [Cube] [varchar](50) NULL,
        [List Price] [varchar](50) NULL,
        [Net Price] [varchar](50) NULL,
        [Height] [varchar](50) NULL,
        [Width] [varchar](50) NULL,
        [Depth] [varchar](50) NULL,
        [Deal Net] [varchar](50) NULL,
        [Picture Name] [varchar](150) NULL,
        [Blank Column] [varchar](50) NULL,
        [Vendor to Stock] [varchar](50) NULL,
        [Priced By] [varchar](50) NULL,
        [Category] [varchar](75) NULL,
        [Vendor Nickname] [varchar](50) NULL,
        [User Vendor Name] [varchar](50) NULL,
        [Configurable?] [varchar](50) NULL,
        [Category Values] [varchar](max) NULL,
        [Freight Class] [varchar](50) NULL,
        [Vendor FOB] [varchar](50) NULL,
        [Ship from Zip] [varchar](50) NULL,
        [Model Apply] [varchar](50) NULL,
        [Picture Link] [varchar](50) NULL,
        [Category Code] [varchar](50) NULL,
        [Vendor Short Name] [varchar](50) NULL,
        [Cutsheet Name] [varchar](150) NULL,
        [Cutsheet Link] [varchar](50) NULL,
        [Product ID] [varchar](50) NULL,
        [Vendor ID] [varchar](50) NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

I created the class for the table with the attributes: [DelimitedRecord(",")] [IgnoreFirst(1)]

class rawdata
{
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Model_Number;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string User_Stock_Model_Number;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Number;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Specification;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Pack;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Selling_Unit;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Weight;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Cube;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string List_Price;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Net_Price;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Height;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Width;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Depth;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Deal_Net;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Picture_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Blank_Column;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_to_Stock;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Priced_By;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Category;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Nickname;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string User_Vendor_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Configurable;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Category_Values;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Freight_Class;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_FOB;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Ship_from_Zip;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Model_Apply;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Picture_Link;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Category_Code;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Short_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Cutsheet_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Cutsheet_Link;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Product_ID;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_ID;  

}

Here is the C# code:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Text.RegularExpressions;
    using System.Threading.Tasks;
    using System.Data;
    using System.Data.SqlClient;
    using System.Windows.Forms;
    using FileHelpers;

    namespace XYZ
    {
        class Class1
        {
            static void Main(string[] args)
            {
                SqlConnection conn1 = new SqlConnection();
                DataTable temp_rawdata_table = new DataTable();

                conn1.ConnectionString = "Data Source=ABC;Initial Catalog=XYZ;Integrated Security=True";

                System.Diagnostics.Stopwatch elapsed = new System.Diagnostics.Stopwatch();
                elapsed.Start(); Int64 rows = 0;

                // ================ Begin BulkCopy ========================
                using (SqlBulkCopy bulkcopy = new SqlBulkCopy(conn1.ConnectionString,
                    System.Data.SqlClient.SqlBulkCopyOptions.TableLock)
                    {
                        DestinationTableName = "rawdata",
                        BulkCopyTimeout = 0,
                        BatchSize = 100000
                    })
                {
                    temp_rawdata_table = new XYZDataSet.rawdataDataTable();

                    // using the ASYNC engine allows for processing record by record
                    FileHelperAsyncEngine engine = new FileHelperAsyncEngine(typeof(rawdata));
                    engine.BeginReadFile("C:\\rawdata.csv");

                    int batchsize = 0;

                    Console.WriteLine("Copying data to table.");
                    // The Async engines are IEnumerable
                    foreach (rawdata aqtext in engine)
                    {
                        //create a new update row for aq360productsraw table
                        DataRow rawdata_update_row = temp_rawdata_table.NewRow();

                        rawdata_update_row["Model Number"] = aqtext.Model_Number.Trim();
                        rawdata_update_row["User Stock Model Number"] = aqtext.User_Stock_Model_Number.Trim();
                        rawdata_update_row["Vendor Number"] = aqtext.Vendor_Number.Trim();
                        rawdata_update_row["Vendor Name"] = aqtext.Vendor_Name.Trim();
                        rawdata_update_row["Specification"] = aqtext.Specification.Trim();
                        rawdata_update_row["Vendor Pack"] = aqtext.Vendor_Pack.Trim();
                        rawdata_update_row["Selling Unit"] = aqtext.Selling_Unit.Trim();
                        rawdata_update_row["Weight"] = aqtext.Weight.Trim();
                        rawdata_update_row["Cube"] = aqtext.Cube.Trim();
                        rawdata_update_row["List Price"] = aqtext.List_Price.Trim();
                        rawdata_update_row["Net Price"] = aqtext.Net_Price.Trim();
                        rawdata_update_row["Height"] = aqtext.Height.Trim();
                        rawdata_update_row["Width"] = aqtext.Width.Trim();
                        rawdata_update_row["Depth"] = aqtext.Depth.Trim();
                        rawdata_update_row["Deal Net"] = aqtext.Deal_Net.Trim();
                        rawdata_update_row["Picture Name"] = aqtext.Picture_Name.Trim();
                        rawdata_update_row["Blank Column"] = aqtext.Blank_Column.Trim();
                        rawdata_update_row["Vendor to Stock"] = aqtext.Vendor_to_Stock.Trim();
                        rawdata_update_row["Priced By"] = aqtext.Priced_By.Trim();
                        rawdata_update_row["Category"] = aqtext.Category.Trim();
                        rawdata_update_row["Vendor Nickname"] = aqtext.Vendor_Nickname.Trim();
                        rawdata_update_row["User Vendor Name"] = aqtext.User_Vendor_Name.Trim();
                        rawdata_update_row["Configurable?"] = aqtext.Configurable.Trim();
                        rawdata_update_row["Category Values"] = aqtext.Category_Values.Trim();
                        rawdata_update_row["Freight Class"] = aqtext.Freight_Class.Trim();
                        rawdata_update_row["Vendor FOB"] = aqtext.Vendor_FOB.Trim();
                        rawdata_update_row["Ship from Zip"] = aqtext.Ship_from_Zip.Trim();
                        rawdata_update_row["Model Apply"] = aqtext.Model_Apply.Trim();
                        rawdata_update_row["Picture Link"] = aqtext.Picture_Link.Trim();
                        rawdata_update_row["Category Code"] = aqtext.Category_Code.Trim();
                        rawdata_update_row["Vendor Short Name"] = aqtext.Vendor_Short_Name.Trim();
                        rawdata_update_row["Cutsheet Name"] = aqtext.Cutsheet_Name.Trim();
                        rawdata_update_row["Cutsheet Link"] = aqtext.Cutsheet_Link.Trim();
                        rawdata_update_row["Product ID"] = aqtext.Product_ID.Trim();
                        rawdata_update_row["Vendor ID"] = aqtext.Vendor_ID.Trim();


                        temp_rawdata_table.Rows.Add(rawdata_update_row);

                        batchsize += 1;
                        if (batchsize == 100000)
                        {
                            bulkcopy.WriteToServer(temp_rawdata_table);
                            temp_rawdata_table.Rows.Clear();
                            batchsize = 0;
                            Console.WriteLine("Flushing 100,000 rows");
                        }

                        rows += 1;

                        Console.WriteLine(rows.ToString() + "    " + aqtext.Model_Number.Trim() + Environment.NewLine);
                    }


                    bulkcopy.WriteToServer(temp_rawdata_table);
                    temp_rawdata_table.Rows.Clear();

                    engine.Close();
                }
                elapsed.Stop();
                Console.WriteLine((rows + " records imported in " +  elapsed.Elapsed.TotalSeconds + " seconds."));
            }
        }
    }

The problem, as also mentioned by @MarcosMeli, is that this is an invalid CSV file. And not just that one field. Even the row that you think is working isn't really working. It seems that whoever created this CSV file did it backwards in terms which fields should be text-qualified (ie "quoted") and which don't need it. They have numeric fields being text-qualified and the text field non-qualified.

The reason why row 1 worked is that text-qualification looks at the first and last characters of the field. In row 1, the escaped-quote (ie double-double-quotes) is not the first character, so it got imported, I suspect, as duplicate double-quotes. Yet in row 2, the beginning text of that field is quoted, hence the first character is a quote, and they then escaped that by duplicating the double-quotes. It is very sloppily done, and even getting FileHelpers to work with it now does not give much confidence in it continuing to work properly, especially if the non-text-qualified text fields ever have commas in them. In that case it would again cause unexpected shifts in the fields. I know you said that the CSV file comes from an external source and you cannot control it, but you really need to attempt to get it fixed as it is completely wrong. It is a bug in whatever system is producing it and it needs to be fixed.

For the moment you can set all of the text fields to non-text-qualified. But then you will likely need to add a step to replace all double double-quotes with a single double-quote.


Data format issues aside, and without taking anything away from FileHelpers as it does look like an interesting and useful library, I will say that you don't need FileHelpers in order to read a text file, row by row (minimal memory footprint) and batch it into SQL Server. In fact, you could do all of that PLUS:

  • skip the step of having a separate staging table (ie [rawdata] ) and instead send the rows directly into a synchronization stored procedure
  • do basic datatype validation in the app layer and send a strongly-typed row of data (rather than passing in all VARCHAR / NVARCHAR fields).

How so? By using Table-Valued Parameters and by passing it in using the IEnumerable<SqlDataRecord> method (and not the DataTable method). I have detailed the technique in a few answers here:

The problem is that the CSV is invalid, you can escape quotes only if the field is quoted

The value:

,""U"" channel for connecting two 29"" A Depth griddles,

to be properly parsed it must be

,"""U"" channel for connecting two 29"" A Depth griddles",

What about to remove the FieldQuoted from the Specification field ?

  public string Specification;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM