简体   繁体   中英

Best way to insert 50K records into big table (app. 350 columns) in postgresql

Using C# I'm trying to insert from csv file (~ 55000 records) to a postgresql table with 350 columns. I'm just building the insert statement from header and record which are comma separated. Like this:

INSERT INTO public.table(field1,fields2,field3, ..... ,field350) VALUES(?,?,?,?,?,.......,?);
INSERT INTO public.table(field1,fields2,field3, ..... ,field350) VALUES(?,?,?,?,?,.......,?);
INSERT INTO public.table(field1,fields2,field3, ..... ,field350) VALUES(?,?,?,?,?,.......,?);
INSERT INTO public.table(field1,fields2,field3, ..... ,field350) VALUES(?,?,?,?,?,.......,?);
....etc

I tried using batch , like gathering 1000 statements and execute them in a transaction , but this takes for 1000 records up to ~3 sec.

I tried to follow the example from Microsoft here in which I call ExecuteNonQuery() on each line but I commit the transaction after 1000 records and begin a new transcation and this is taking up to ~3 sec per 1000 records as well. Something like this:

foreach (var line in ReadCsvFileToList(dataFile))
{
    try
    {
        if (firstLine)
        {
            header = line;
            firstLine = false;
            continue;
        }
        else
        {
            var formattedLine = line.Replace("\"\"", "NULL").Replace("'", "''").Replace("\"", "'");
            var commandText = $"INSERT INTO public.table({header.Replace('/', '_')}) VALUES ({formattedLine})";
            command.CommandText = commandText;
            await command.ExecuteNonQueryAsync();
            round++;
        }

        if (round == 1000) // batch size
        {
            await transaction.CommitAsync();
            Console.WriteLine("batch commited to DB at: " + DateTime.Now.ToString("ddMMyyy hh:mm:ss"));
            round = 0;
            transaction = connection.BeginTransaction();
        }
    }
    catch (Exception)
    {
        await connection.CloseAsync();
        await connection.DisposeAsync();
        await transaction?.RollbackAsync();
        throw;
    }
}

An idea on how to optimize this more?

The quickest way to read.csv files into a DB, is to do it in the DB Server/DBMS.

C# will never even get close to that performance. And at best you add the load to transfer the whole data over the network once. If you have to do it from code, there is usually a proper Bulk Insert way. Apparently for Post Gre that one is called COPY .

If that is not a option either, at least make a explicit DB-transaction. Doing Inserts in a loop will have the DB open and commit a Transaction every single INSERT. Instead using a explicit transaction around this can improove performance at least somewhat.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM