简体   繁体   English

从文本文件导入到SQL Server数据库,是ADO.NET太慢了吗?

[英]import from text file to SQL Server Database, is ADO.NET too slow?

My program is now still running to import data from a log file into a remote SQL Server Database. 我的程序现在仍在运行,以将数据从日志文件导入远程SQL Server数据库。 The log file is about 80MB in size and contains about 470000 lines, with about 25000 lines of data. 日志文件大小约为80MB,包含大约470000行,包含大约25000行数据。 My program can import only 300 rows/second, which is really bad. 我的程序只能导入300行/秒,这真的很糟糕。 :( :(

public static int ImportData(string strPath)
{
    //NameValueCollection collection = ConfigurationManager.AppSettings;

    using (TextReader sr = new StreamReader(strPath))
    {
        sr.ReadLine(); //ignore three first lines of log file
        sr.ReadLine();
        sr.ReadLine();
        string strLine;
        var cn = new SqlConnection(ConnectionString);
        cn.Open();

        while ((strLine = sr.ReadLine()) != null)
        {
            {
                if (strLine.Trim() != "") //if not a blank line, then import into database
                {
                    InsertData(strLine, cn);
                    _count++;
                }
            }
        }
        cn.Close();
        sr.Close();

        return _count;
    }
}

InsertData is just a normal insert method using ADO.NET. InsertData只是使用ADO.NET的普通插入方法。 It uses a parsing method: 它使用解析方法:

public Data(string strLine)
{
    string[] list = strLine.Split(new[] {'\t'});
    try
    {
        Senttime = DateTime.Parse(list[0] + " " + list[1]);
    }
    catch (Exception)
    {
    }

    Clientip = list[2];
    Clienthostname = list[3];

    Partnername = list[4];
    Serverhostname = list[5];
    Serverip = list[6];

    Recipientaddress = list[7];
    Eventid = Convert.ToInt16(list[8]);
    Msgid = list[9];
    Priority = Convert.ToInt16(list[10]);
    Recipientreportstatus = Convert.ToByte(list[11]);
    Totalbytes = Convert.ToInt32(list[12]);
    Numberrecipient = Convert.ToInt16(list[13]);
    DateTime temp;
    if (DateTime.TryParse(list[14], out temp))
    {
        OriginationTime = temp;
    }
    else
    {
        OriginationTime = null;
    }
    Encryption = list[15];
    ServiceVersion = list[16];
    LinkedMsgid = list[17];
    MessageSubject = list[18];
    SenderAddress = list[19];
}

InsertData method: InsertData方法:

private static void InsertData(string strLine, SqlConnection cn)
{
    var dt = new Data(strLine); //parse the log line into proper fields 
    const string cnnStr =
        "INSERT INTO LOGDATA ([SentTime]," + "[client-ip]," +
        "[Client-hostname]," + "[Partner-Name]," + "[Server-hostname]," +
        "[server-IP]," + "[Recipient-Address]," + "[Event-ID]," + "[MSGID]," +
        "[Priority]," + "[Recipient-Report-Status]," + "[total-bytes]," +
        "[Number-Recipients]," + "[Origination-Time]," + "[Encryption]," +
        "[service-Version]," + "[Linked-MSGID]," + "[Message-Subject]," +
        "[Sender-Address]) " + " VALUES (     " + "@Senttime," + "@Clientip," +
        "@Clienthostname," + "@Partnername," + "@Serverhostname," + "@Serverip," +
        "@Recipientaddress," + "@Eventid," + "@Msgid," + "@Priority," +
        "@Recipientreportstatus," + "@Totalbytes," + "@Numberrecipient," +
        "@OriginationTime," + "@Encryption," + "@ServiceVersion," +
        "@LinkedMsgid," + "@MessageSubject," + "@SenderAddress)";


    var cmd = new SqlCommand(cnnStr, cn) {CommandType = CommandType.Text};

    cmd.Parameters.AddWithValue("@Senttime", dt.Senttime);
    cmd.Parameters.AddWithValue("@Clientip", dt.Clientip);
    cmd.Parameters.AddWithValue("@Clienthostname", dt.Clienthostname);
    cmd.Parameters.AddWithValue("@Partnername", dt.Partnername);
    cmd.Parameters.AddWithValue("@Serverhostname", dt.Serverhostname);
    cmd.Parameters.AddWithValue("@Serverip", dt.Serverip);
    cmd.Parameters.AddWithValue("@Recipientaddress", dt.Recipientaddress);
    cmd.Parameters.AddWithValue("@Eventid", dt.Eventid);
    cmd.Parameters.AddWithValue("@Msgid", dt.Msgid);
    cmd.Parameters.AddWithValue("@Priority", dt.Priority);
    cmd.Parameters.AddWithValue("@Recipientreportstatus", dt.Recipientreportstatus);
    cmd.Parameters.AddWithValue("@Totalbytes", dt.Totalbytes);
    cmd.Parameters.AddWithValue("@Numberrecipient", dt.Numberrecipient);
    if (dt.OriginationTime != null)
        cmd.Parameters.AddWithValue("@OriginationTime", dt.OriginationTime);
    else
        cmd.Parameters.AddWithValue("@OriginationTime", DBNull.Value);
            //if OriginationTime was null, then insert with null value to this column
    cmd.Parameters.AddWithValue("@Encryption", dt.Encryption);
    cmd.Parameters.AddWithValue("@ServiceVersion", dt.ServiceVersion);
    cmd.Parameters.AddWithValue("@LinkedMsgid", dt.LinkedMsgid);
    cmd.Parameters.AddWithValue("@MessageSubject", dt.MessageSubject);
    cmd.Parameters.AddWithValue("@SenderAddress", dt.SenderAddress);
    cmd.ExecuteNonQuery();
}

How can my program run faster? 我的程序如何运行得更快? Thank you so much! 非常感谢!

Use SqlBulkCopy . 使用SqlBulkCopy

Edit: I created a minimal implementation of IDataReader and created a Batch type so that I could insert arbitrary in-memory data using SqlBulkCopy . 编辑:我创建了一个IDataReader的最小实现并创建了一个Batch类型,以便我可以使用SqlBulkCopy插入任意内存数据。 Here is the important bit: 这是重要的一点:

IDataReader dr = batch.GetDataReader();
using (SqlTransaction tx = _connection.BeginTransaction())
{
    try
    {
        using (SqlBulkCopy sqlBulkCopy =
            new SqlBulkCopy(_connection, SqlBulkCopyOptions.Default, tx))
        {
            sqlBulkCopy.DestinationTableName = TableName;
            SetColumnMappings(sqlBulkCopy.ColumnMappings);
            sqlBulkCopy.WriteToServer(dr);
            tx.Commit();
        }
    }
    catch
    {
        tx.Rollback();
        throw;
    }
}

The rest of the implementation is left as an exercise for the reader :) 其余的实现留给读者:)

Hint: the only bits of IDataReader you need to implement are Read , GetValue and FieldCount . 提示:您需要实现的IDataReader的唯一位是ReadGetValueFieldCount

Hmmm, let's break this down a little bit. 嗯,让我们稍微分解一下。

In pseudocode what you did is the ff: 在伪代码中你所做的是ff:

  1. Open the file 打开文件
    • Open a connection 打开连接
    • For every line that has data: 对于每个有数据的行:
    • Parse the string 解析字符串
    • Save the data in SQL Server 将数据保存在SQL Server中
    • Close the connection 关闭连接
    • Close the file 关闭文件

Now the fundamental problems in doing it this way are: 现在这样做的根本问题是:

  • You are keeping a SQL connection open while waiting for your line parsing (pretty susceptible to timeouts and stuff) 在等待行解析时保持SQL连接处于打开状态(非常容易受到超时和东西的影响)
  • You might be saving the data line by line, each in its own transaction. 可能会逐行保存数据,每个数据都在自己的事务中。 We won't know until you show us what the InsertData method is doing 在您向我们展示InsertData方法正在执行的操作之前,我们不会知道
  • Consequently you are keeping the file open while waiting for SQL to finish inserting 因此,您在等待SQL完成插入时保持文件处于打开状态

The optimal way of doing this is to parse the file as a whole, and then insert them in bulk. 执行此操作的最佳方法是将文件整体解析,然后批量插入它们。 You can do this with SqlBulkCopy (as suggested by Matt Howells), or with SQL Server Integration Services. 您可以使用SqlBulkCopy (由Matt Howells建议)或SQL Server Integration Services执行此操作。

If you want to stick with ADO.NET, you can pool together your INSERT statements and then pass them off into one large SQLCommand, instead of doing it this way eg, setting up one SQLCommand object per insert statement. 如果你想坚持使用ADO.NET,你可以将INSERT语句汇集在一起​​,然后将它们传递给一个大的SQLCommand,而不是这样做,例如,为每个insert语句设置一个SQLCommand对象。

You create the SqlCommand object for every row of data. 您为每行数据创建SqlCommand对象。 The simplest improvement would therefore to create a 因此,最简单的改进就是创造一个

private static SqlCommand cmdInsert

and declare the parameters with the Parameters.Add() method. 并使用Parameters.Add()方法声明参数。 Then for each data row, set the parameter values using 然后,对于每个数据行,使用设置参数值

cmdInsert.Parameters["@paramXXX"].Value = valueXXX;

A second performance improvement might be to skip creation of Data objects for each row, and assign Parameter values directly from the list[] array. 第二个性能改进可能是跳过为每一行创建Data对象,并直接从list []数组中分配Parameter值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM