SqlBulkCopy DataTables as they are added to a DataSet

Question

I'd like to parse values from a csv into datatable chunks, add them to a dataset, and then use SQLBulkCopy to insert the datatables into a single table in SQL. The original csv can range from 4 GB to 8 GB, and I need to avoid reading the entire thing into memory, hence the chunking. I loosely based my chunking on this post . I use LumenWorks to parse the csv values.

As soon as a datatable is added to the dataset, I want to use SqlBulkCopy to insert it into my SQL table, all while the next datatable is being created. After the SqlBulkCopy completes, I want to remove the datatable to release the memory.

My first thought is to run the chunking method asynchronously without await, then run a while loop that checks for the existence of a next datatable in the dataset. If the datatable exists, then bulk copy. If the datatable row count is less then the row limit, then it is the last chunk and stop while loop.

Am I going about this the wrong way? If not, how can I do something like this?

        string filePath = @"C:\Users\user\Downloads\Testing\file - Copy.csv";
        DataSet ds = new DataSet();

        bool continueInsert = true;
        int rowLimit = 100000;
        int tableNumber = 0;

        //Start this, but do not wait for it to complete before starting while loop
        ChunkCSV(filePath, ds, rowLimit);

        //Run SqlBulkCopy if datatable exists 
        while (continueInsert)
        {
            if (ds.Tables.Contains("tbl_" + tableNumber))
            {
                DataTable dataTable = ds.Tables["tbl_" + tableNumber];

                //SqlBulkCopy dataTable code HERE

                if (ds.Tables["tbl_" + tableNumber].Rows.Count < rowLimit)
                {
                    continueInsert = false;
                }

                //Remove datatable from dataset to release memory
                ds.Tables.Remove("tbl_" + tableNumber);

                tableNumber++;
            }
            else
            {
                Thread.Sleep(1000);
            }
        }

Here is my chunking code:

    private static void ChunkCSV(string filePath, DataSet dataSet, int rowLimit)
    {
        char delimiter = ',';

        DataTable dtChunk = null;
        int tableNumber = 0;
        int chunkRowCount = 0;
        bool firstLineOfChunk = true;

        using (var sr = new StreamReader(filePath))
        using (CsvReader csv = new CsvReader(sr, false, delimiter, '\"', '\0', '\0', ValueTrimmingOptions.All, 65536))
        {
            int fieldCount = csv.FieldCount;
            string[] row = new string[fieldCount];

            //Add fields when necessary
            csv.MissingFieldAction = MissingFieldAction.ReplaceByEmpty;

            while (csv.ReadNextRecord())
            {
                if (firstLineOfChunk)
                {
                    firstLineOfChunk = false;
                    dtChunk = CreateDataTable(fieldCount, tableNumber);
                }

                DataRow dataRow = dtChunk.NewRow();

                csv.CopyCurrentRecordTo(row);
                for (int f = 0; f < fieldCount; f++)
                {
                    dataRow[f] = row[f];
                }

                dtChunk.Rows.Add(dataRow);
                chunkRowCount++;

                if (chunkRowCount == rowLimit)
                {
                    firstLineOfChunk = true;
                    chunkRowCount = 0;
                    tableNumber++;
                    dataSet.Tables.Add(dtChunk);
                    dtChunk = null;
                }
            }
        }

        if (dtChunk != null)
        {
            dataSet.Tables.Add(dtChunk);
        }

    }
    private static DataTable CreateDataTable(int fieldCount, int tableNumber)
    {
        DataTable dt = new DataTable("tbl_" + tableNumber);

        for(int i = 0; i < fieldCount; i++)
        {
            dt.Columns.Add("Column_" + i);
        }

        return dt;
    }

Answer 1

There's no reason to use a DataTable to begin with.

Use the SqlBulkCopy.WriteToServer(IDataReader) overload, and you can stream the whole file directly to SQL Server. And use SqlBulkCopy.BatchSize if you don't want all the rows loaded in a single transaction.

eg

using (var sr = new StreamReader(filePath))
using (CsvReader csv = new CsvReader(sr, false, delimiter, '\"', '\0', '\0', ValueTrimmingOptions.All, 65536))
{
    bulkCopy.WriteToServer(csv);
}

SqlBulkCopy DataTables as they are added to a DataSet

Question

1 answers

solution1
3 ACCPTED 2020-01-03 17:56:28

SqlBulkCopy DataTables as they are added to a DataSet

Question

1 answers

solution1 3 ACCPTED 2020-01-03 17:56:28

solution1
3 ACCPTED 2020-01-03 17:56:28