简体   繁体   中英

import from text file to SQL Server Database, is ADO.NET too slow?

My program is now still running to import data from a log file into a remote SQL Server Database. The log file is about 80MB in size and contains about 470000 lines, with about 25000 lines of data. My program can import only 300 rows/second, which is really bad. :(

public static int ImportData(string strPath)
{
    //NameValueCollection collection = ConfigurationManager.AppSettings;

    using (TextReader sr = new StreamReader(strPath))
    {
        sr.ReadLine(); //ignore three first lines of log file
        sr.ReadLine();
        sr.ReadLine();
        string strLine;
        var cn = new SqlConnection(ConnectionString);
        cn.Open();

        while ((strLine = sr.ReadLine()) != null)
        {
            {
                if (strLine.Trim() != "") //if not a blank line, then import into database
                {
                    InsertData(strLine, cn);
                    _count++;
                }
            }
        }
        cn.Close();
        sr.Close();

        return _count;
    }
}

InsertData is just a normal insert method using ADO.NET. It uses a parsing method:

public Data(string strLine)
{
    string[] list = strLine.Split(new[] {'\t'});
    try
    {
        Senttime = DateTime.Parse(list[0] + " " + list[1]);
    }
    catch (Exception)
    {
    }

    Clientip = list[2];
    Clienthostname = list[3];

    Partnername = list[4];
    Serverhostname = list[5];
    Serverip = list[6];

    Recipientaddress = list[7];
    Eventid = Convert.ToInt16(list[8]);
    Msgid = list[9];
    Priority = Convert.ToInt16(list[10]);
    Recipientreportstatus = Convert.ToByte(list[11]);
    Totalbytes = Convert.ToInt32(list[12]);
    Numberrecipient = Convert.ToInt16(list[13]);
    DateTime temp;
    if (DateTime.TryParse(list[14], out temp))
    {
        OriginationTime = temp;
    }
    else
    {
        OriginationTime = null;
    }
    Encryption = list[15];
    ServiceVersion = list[16];
    LinkedMsgid = list[17];
    MessageSubject = list[18];
    SenderAddress = list[19];
}

InsertData method:

private static void InsertData(string strLine, SqlConnection cn)
{
    var dt = new Data(strLine); //parse the log line into proper fields 
    const string cnnStr =
        "INSERT INTO LOGDATA ([SentTime]," + "[client-ip]," +
        "[Client-hostname]," + "[Partner-Name]," + "[Server-hostname]," +
        "[server-IP]," + "[Recipient-Address]," + "[Event-ID]," + "[MSGID]," +
        "[Priority]," + "[Recipient-Report-Status]," + "[total-bytes]," +
        "[Number-Recipients]," + "[Origination-Time]," + "[Encryption]," +
        "[service-Version]," + "[Linked-MSGID]," + "[Message-Subject]," +
        "[Sender-Address]) " + " VALUES (     " + "@Senttime," + "@Clientip," +
        "@Clienthostname," + "@Partnername," + "@Serverhostname," + "@Serverip," +
        "@Recipientaddress," + "@Eventid," + "@Msgid," + "@Priority," +
        "@Recipientreportstatus," + "@Totalbytes," + "@Numberrecipient," +
        "@OriginationTime," + "@Encryption," + "@ServiceVersion," +
        "@LinkedMsgid," + "@MessageSubject," + "@SenderAddress)";


    var cmd = new SqlCommand(cnnStr, cn) {CommandType = CommandType.Text};

    cmd.Parameters.AddWithValue("@Senttime", dt.Senttime);
    cmd.Parameters.AddWithValue("@Clientip", dt.Clientip);
    cmd.Parameters.AddWithValue("@Clienthostname", dt.Clienthostname);
    cmd.Parameters.AddWithValue("@Partnername", dt.Partnername);
    cmd.Parameters.AddWithValue("@Serverhostname", dt.Serverhostname);
    cmd.Parameters.AddWithValue("@Serverip", dt.Serverip);
    cmd.Parameters.AddWithValue("@Recipientaddress", dt.Recipientaddress);
    cmd.Parameters.AddWithValue("@Eventid", dt.Eventid);
    cmd.Parameters.AddWithValue("@Msgid", dt.Msgid);
    cmd.Parameters.AddWithValue("@Priority", dt.Priority);
    cmd.Parameters.AddWithValue("@Recipientreportstatus", dt.Recipientreportstatus);
    cmd.Parameters.AddWithValue("@Totalbytes", dt.Totalbytes);
    cmd.Parameters.AddWithValue("@Numberrecipient", dt.Numberrecipient);
    if (dt.OriginationTime != null)
        cmd.Parameters.AddWithValue("@OriginationTime", dt.OriginationTime);
    else
        cmd.Parameters.AddWithValue("@OriginationTime", DBNull.Value);
            //if OriginationTime was null, then insert with null value to this column
    cmd.Parameters.AddWithValue("@Encryption", dt.Encryption);
    cmd.Parameters.AddWithValue("@ServiceVersion", dt.ServiceVersion);
    cmd.Parameters.AddWithValue("@LinkedMsgid", dt.LinkedMsgid);
    cmd.Parameters.AddWithValue("@MessageSubject", dt.MessageSubject);
    cmd.Parameters.AddWithValue("@SenderAddress", dt.SenderAddress);
    cmd.ExecuteNonQuery();
}

How can my program run faster? Thank you so much!

Use SqlBulkCopy .

Edit: I created a minimal implementation of IDataReader and created a Batch type so that I could insert arbitrary in-memory data using SqlBulkCopy . Here is the important bit:

IDataReader dr = batch.GetDataReader();
using (SqlTransaction tx = _connection.BeginTransaction())
{
    try
    {
        using (SqlBulkCopy sqlBulkCopy =
            new SqlBulkCopy(_connection, SqlBulkCopyOptions.Default, tx))
        {
            sqlBulkCopy.DestinationTableName = TableName;
            SetColumnMappings(sqlBulkCopy.ColumnMappings);
            sqlBulkCopy.WriteToServer(dr);
            tx.Commit();
        }
    }
    catch
    {
        tx.Rollback();
        throw;
    }
}

The rest of the implementation is left as an exercise for the reader :)

Hint: the only bits of IDataReader you need to implement are Read , GetValue and FieldCount .

Hmmm, let's break this down a little bit.

In pseudocode what you did is the ff:

  1. Open the file
    • Open a connection
    • For every line that has data:
    • Parse the string
    • Save the data in SQL Server
    • Close the connection
    • Close the file

Now the fundamental problems in doing it this way are:

  • You are keeping a SQL connection open while waiting for your line parsing (pretty susceptible to timeouts and stuff)
  • You might be saving the data line by line, each in its own transaction. We won't know until you show us what the InsertData method is doing
  • Consequently you are keeping the file open while waiting for SQL to finish inserting

The optimal way of doing this is to parse the file as a whole, and then insert them in bulk. You can do this with SqlBulkCopy (as suggested by Matt Howells), or with SQL Server Integration Services.

If you want to stick with ADO.NET, you can pool together your INSERT statements and then pass them off into one large SQLCommand, instead of doing it this way eg, setting up one SQLCommand object per insert statement.

You create the SqlCommand object for every row of data. The simplest improvement would therefore to create a

private static SqlCommand cmdInsert

and declare the parameters with the Parameters.Add() method. Then for each data row, set the parameter values using

cmdInsert.Parameters["@paramXXX"].Value = valueXXX;

A second performance improvement might be to skip creation of Data objects for each row, and assign Parameter values directly from the list[] array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM