How to fill PostgreSQL tables from C# DataSet via Npgsql?

Question

I have a DataSet in C# with DataTables and PostgreSQL database with the same tables. I fill DataTable in my code and want to INSERT DataTable to Postgresql DataBase. I tried to insert it with simple SQL queries ( INSERT INTO... ), but it's very slowly if I have hundred tables of thousands rows. I guess, using DataAdapter will improve performance, but I cant understand, how does it work. Can you explain me at two cases example?

case1: Inserting DataSet's tables to Postgresql with DataAdapter

case2: Inserting only uniq values from DataSet to PostgreSQL (if table in database have rows with uniq keys and DataTable contain the same)

Or maybe you can suggest what to read to learn DataAdapters... Anyway, thanks.

Answer 1

With the exception of trivially small datasets, you're going to have a hard time beating the performance of NpgSql's implementation of copy , which can be accomplished via the BeginTextImport method of your NpgSqlConnection object.

So, regardless of how your data exists in your application, if you dump the output via the text import (copy), it should be very zippy. Here is an example of how you would do that with a datatable. Bear in mind the columns in the datatable and the columns in the table would have to line up -- if not, you need to manage that one way or the other.

This presupposes NpgSql 3.1.9 or higher.

object[] outRow = new object[dt.Columns.Count];

using (var writer = conn.BeginTextImport("copy <table> from STDIN WITH NULL AS '' CSV"))
{
    foreach (DataRow rw in dt.Rows)
    {
        for (int col = 0; col < dt.Columns.Count; col++)
            outRow[col] = rw[col];

        writer.WriteLine(string.Join(",", outRow));
    }
}

As far as duplicates... wow, that really depends. Define "duplicates." If it's just a "select distinct," then it also depends on how many duplicates you expect. If it's a small amount, a List.Exists<> would probably be adequate, but if you have a large number of dupes a Dictionary object would make each lookup a lot more efficient. A typical List lookup is O(n), while a Dictionary lookup would be O(1).

Here's a pretty brute-force example of a dictionary distinct insert for the above example:

object[] outRow = new object[dt.Columns.Count];
Dictionary<string, bool> already = new Dictionary<string, bool>();
bool test;

using (var writer = conn.BeginTextImport("copy <table> from STDIN WITH NULL AS '' CSV"))
{
    foreach (DataRow rw in dt.Rows)
    {
        for (int col = 0; col < dt.Columns.Count; col++)
            outRow[col] = rw[col];

        string output = string.Join(",", outRow);
        if (!already.TryGetValue(output, out test))
        {
            writer.WriteLine(output);
            already.Add(output, true);
        }
    }
}

Disclaimer: This is a memory pig. If you can manage dupes any other way, or guarantee the ordering of the data, there are numerous other options.

If you can't (or won't) use a bulk copy insert, something that would help performance would be to wrap your inserts into a transaction ( NpgSqlTransaction ), but for hundreds of thousands of rows, I can't see why you would.

How to fill PostgreSQL tables from C# DataSet via Npgsql?

Question

1 answers

solution1
0 2017-10-06 01:57:55

How to fill PostgreSQL tables from C# DataSet via Npgsql?

Question

1 answers

solution1 0 2017-10-06 01:57:55

solution1
0 2017-10-06 01:57:55