简体   繁体   English

在C#中解析和上传> 1GB的数据

[英]Parsing and uploading >1GB of data in C#

I have written a program to parse and upload a huge amount of Data to a Database. 我编写了一个程序来解析并将大量数据上传到数据库。 The problem is that the parsing is WAY too slow. 问题是解析方式太慢了。 The way my program works is I have the Parser class which parses (using parallelisation) each file, and raises an event for each entry it parses in each file: 我的程序的工作方式是我有一个Parser类,它解析(使用并行化)每个文件,并为每个文件中解析的每个条目引发一个事件:

Parallel.ForEach<FileInfo>(
    files,
    new ParallelOptions { MaxDegreeOfParallelism = maxParallelism },
    (inputFile, args) =>
    {
        // Using underlying FileStream to allow concurrent Read/Write access.
        using (var input = new StreamReader(inputFile.FullName))
        {
            while (!input.EndOfStream)
            {
                RaiseEntryParsed(ParseCity(input.ReadLine()));
            }
            ParsedFiles++;
            RaiseFileParsed(inputFile);
        }
    });
RaiseDirectoryParsed(Directory);

The "main" program is subscribed to this event, and adds the entries to a DataTable to do a SqlBulkCopy; “main”程序订阅此事件,并将条目添加到DataTable以执行SqlBulkCopy; the SqlBulkCopy only submits when the parser class raises the FileParsed event (every time a file is parsed): SqlBulkCopy仅在解析器类引发FileParsed事件时(每次解析文件时)提交:

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(_connectionString))
{
    DataTable cityTable = DataContext.CreateCityDataTable();
    parser.EntryParsed +=
        (s, e) =>
        {
            DataRow cityRow = cityTable.NewRow();
            City parsedCity = (City)e.DatabaseEntry;

            cityRow["id"] = parsedCity.Id;
            ...
            ...

            cityTable.Rows.Add(cityRow);
        };

    parser.FileParsed +=
        (s, e) =>
        {
            bulkCopy.WriteToServer(cityTable);
            Dispatcher.BeginInvoke((Action)UpdateProgress);
            cityTable.Rows.Clear();
        };

    parser.DirectoryParsed +=
        (s, e) =>
        {
            bulkCopy.WriteToServer(cityTable);
            Dispatcher.BeginInvoke((Action)UpdateProgress);
        };

    parser.BeginParsing();
}

The reason the table's rows are being cleared after each submission is to conserve memory and prevent an OutOfMemoryException from so many entities being in memory... 每次提交后清除表行的原因是为了节省内存并防止来自内存中的这么多实体的OutOfMemoryException ...

How can I make this faster, it is currently unacceptably slow. 我怎样才能让它更快,目前速度慢得令人无法接受。 I profiled the application and it stated that most of the time is being spent on the Entryparsed event. 我描述了该应用程序,并表示大部分时间都花在了Entryparsed事件上。 Thanks 谢谢

I made a short test project, and tried out a few different approaches. 我做了一个简短的测试项目,尝试了几种不同的方法。 My goal was to build a DataTable with 27 columns and (id,A,B,C,...,Z) and NumOfRows about 300,000 as quickly as possible using just sequential code. 我的目标是使用仅仅顺序代码尽快建立一个包含27列和(id,A,B,C,...,Z)和NumOfRows的DataTable约300,000。

(each row is populated with an id and the rest of the columns are filled with random 5-letter words). (每行填充一个id,其余列用随机的5个字母单词填充)。

On my fourth attempt, I stumbled upon a different syntax for adding the row to the table based on an array of values of type Object. 在我的第四次尝试中,我偶然发现了一种不同的语法,用于根据Object类型的值数组将行添加到表中。 (see here ). (见这里 )。

In your case it would be something like: 在你的情况下,它将是这样的:

cityTable.Rows.Add( new Object[] {

  ((City)e.DatabaseEntry).Id ,

  ObjectThatGoesInColumn2    ,

  ObjectThatGoesInColumn3    ,

  ObjectThatGoesInLastColumn

}

instead of: 代替:

DataRow row = cityTable.NewRow();

row[0] = 100;
row["City Name"] = Anaheim;
row["Column 7"] = ...
...
row["Column 26"] = checksum;

workTable.Rows.Add( row );

This will give you a speed-up since you won't be individually setting each column one at a time, and based on your pic of the profiler, you have at least 12 columns that you were setting individually. 这将为您提供加速,因为您不会逐个单独设置每个列,并且根据您的探查器的图片,您至少有12个单独设置的列。

This also saves it from hashing the column name strings to see which array position you are dealing with, and then double checking that the data type is correct. 这也使它不会散列列名字符串,以查看您正在处理的数组位置,然后仔细检查数据类型是否正确。

In case you are interested, here is my test project: 如果您有兴趣,这是我的测试项目:

class Program
{
    public static System.Data.DataSet dataSet;
    public static System.Data.DataSet dataSet2;
    public static System.Data.DataSet dataSet3;
    public static System.Data.DataSet dataSet4;

    public static Random rand = new Random();

    public static int NumOfRows = 300000;

    static void Main(string[] args)
    {

        #region test1

        Console.WriteLine("Starting");

        Console.WriteLine("");

        Stopwatch watch = new Stopwatch();

        watch.Start();

        MakeTable();

        watch.Stop();

        Console.WriteLine("Elapsed Time was: " + watch.ElapsedMilliseconds + " milliseconds.");
        dataSet = null;

        Console.WriteLine("");

        Console.WriteLine("Completed.");

        Console.WriteLine("");

        #endregion

        /*

        #region test2


        Console.WriteLine("Starting Test 2");

        Console.WriteLine("");

        watch.Reset();

        watch.Start();

        MakeTable2();

        watch.Stop();

        Console.WriteLine("Elapsed Time was: " + watch.ElapsedMilliseconds + " milliseconds.");
        dataSet2 = null;

        Console.WriteLine("");

        Console.WriteLine("Completed Test 2.");

        #endregion


        #region test3
        Console.WriteLine("");

        Console.WriteLine("Starting Test 3");

        Console.WriteLine("");

        watch.Reset();

        watch.Start();

        MakeTable3();

        watch.Stop();

        Console.WriteLine("Elapsed Time was: " + watch.ElapsedMilliseconds + " milliseconds.");
        dataSet3 = null;

        Console.WriteLine("");

        Console.WriteLine("Completed Test 3.");

        #endregion

         */ 

        #region test4
        Console.WriteLine("Starting Test 4");

        Console.WriteLine("");

        watch.Reset();

        watch.Start();

        MakeTable4();

        watch.Stop();

        Console.WriteLine("Elapsed Time was: " + watch.ElapsedMilliseconds + " milliseconds.");
        dataSet4 = null;

        Console.WriteLine("");

        Console.WriteLine("Completed Test 4.");

        #endregion


        //printTable();

        Console.WriteLine("");
        Console.WriteLine("Press Enter to Exit...");

        Console.ReadLine();
    }

    private static void MakeTable()
    {
        DataTable table = new DataTable("Table 1");

        DataColumn column;
        DataRow row;

        column = new DataColumn();
        column.DataType = System.Type.GetType("System.Int32");
        column.ColumnName = "id";
        column.ReadOnly = true;
        column.Unique = true;

        table.Columns.Add(column);


        for (int i = 65; i <= 90; i++)
        {
            column = new DataColumn();
            column.DataType = System.Type.GetType("System.String");
            column.ColumnName = "5-Letter Word " + (char)i;
            column.AutoIncrement = false;
            column.Caption = "Random Word " + (char)i;
            column.ReadOnly = false;
            column.Unique = false;
            // Add the column to the table.
            table.Columns.Add(column);
        }

        DataColumn[] PrimaryKeyColumns = new DataColumn[1];
        PrimaryKeyColumns[0] = table.Columns["id"];
        table.PrimaryKey = PrimaryKeyColumns;

        // Instantiate the DataSet variable.
        dataSet = new DataSet();
        // Add the new DataTable to the DataSet.
        dataSet.Tables.Add(table);

        // Create three new DataRow objects and add 
        // them to the DataTable
        for (int i = 0; i < NumOfRows; i++)
        {
            row = table.NewRow();
            row["id"] = i;

            for (int j = 65; j <= 90; j++)
            {
                row["5-Letter Word " + (char)j] = getRandomWord();
            }

            table.Rows.Add(row);
        }

    }

    private static void MakeTable2()
    {
        DataTable table = new DataTable("Table 2");

        DataColumn column;
        DataRow row;

        column = new DataColumn();
        column.DataType = System.Type.GetType("System.Int32");
        column.ColumnName = "id";
        column.ReadOnly = true;
        column.Unique = true;

        table.Columns.Add(column);


        for (int i = 65; i <= 90; i++)
        {
            column = new DataColumn();
            column.DataType = System.Type.GetType("System.String");
            column.ColumnName = "5-Letter Word " + (char)i;
            column.AutoIncrement = false;
            column.Caption = "Random Word " + (char)i;
            column.ReadOnly = false;
            column.Unique = false;
            // Add the column to the table.
            table.Columns.Add(column);
        }

        DataColumn[] PrimaryKeyColumns = new DataColumn[1];
        PrimaryKeyColumns[0] = table.Columns["id"];
        table.PrimaryKey = PrimaryKeyColumns;

        // Instantiate the DataSet variable.
        dataSet2 = new DataSet();
        // Add the new DataTable to the DataSet.
        dataSet2.Tables.Add(table);

        // Create three new DataRow objects and add 
        // them to the DataTable
        for (int i = 0; i < NumOfRows; i++)
        {
            row = table.NewRow();

            row.BeginEdit();

            row["id"] = i;

            for (int j = 65; j <= 90; j++)
            {
                row["5-Letter Word " + (char)j] = getRandomWord();
            }

            row.EndEdit();

            table.Rows.Add(row);
        }

    }

    private static void MakeTable3()
    {
        DataTable table = new DataTable("Table 3");

        DataColumn column;

        column = new DataColumn();
        column.DataType = System.Type.GetType("System.Int32");
        column.ColumnName = "id";
        column.ReadOnly = true;
        column.Unique = true;

        table.Columns.Add(column);


        for (int i = 65; i <= 90; i++)
        {
            column = new DataColumn();
            column.DataType = System.Type.GetType("System.String");
            column.ColumnName = "5-Letter Word " + (char)i;
            column.AutoIncrement = false;
            column.Caption = "Random Word " + (char)i;
            column.ReadOnly = false;
            column.Unique = false;
            // Add the column to the table.
            table.Columns.Add(column);
        }

        DataColumn[] PrimaryKeyColumns = new DataColumn[1];
        PrimaryKeyColumns[0] = table.Columns["id"];
        table.PrimaryKey = PrimaryKeyColumns;

        // Instantiate the DataSet variable.
        dataSet3 = new DataSet();
        // Add the new DataTable to the DataSet.
        dataSet3.Tables.Add(table);


        DataRow[] newRows = new DataRow[NumOfRows];

        for (int i = 0; i < NumOfRows; i++)
        {
            newRows[i] = table.NewRow();
        }

        // Create three new DataRow objects and add 
        // them to the DataTable
        for (int i = 0; i < NumOfRows; i++)
        {

            newRows[i]["id"] = i;

            for (int j = 65; j <= 90; j++)
            {
                newRows[i]["5-Letter Word " + (char)j] = getRandomWord();
            }

            table.Rows.Add(newRows[i]);
        }

    }

    private static void MakeTable4()
    {
        DataTable table = new DataTable("Table 2");

        DataColumn column;

        column = new DataColumn();
        column.DataType = System.Type.GetType("System.Int32");
        column.ColumnName = "id";
        column.ReadOnly = true;
        column.Unique = true;

        table.Columns.Add(column);


        for (int i = 65; i <= 90; i++)
        {
            column = new DataColumn();
            column.DataType = System.Type.GetType("System.String");
            column.ColumnName = "5-Letter Word " + (char)i;
            column.AutoIncrement = false;
            column.Caption = "Random Word " + (char)i;
            column.ReadOnly = false;
            column.Unique = false;
            // Add the column to the table.
            table.Columns.Add(column);
        }

        DataColumn[] PrimaryKeyColumns = new DataColumn[1];
        PrimaryKeyColumns[0] = table.Columns["id"];
        table.PrimaryKey = PrimaryKeyColumns;

        // Instantiate the DataSet variable.
        dataSet4 = new DataSet();
        // Add the new DataTable to the DataSet.
        dataSet4.Tables.Add(table);

        // Create three new DataRow objects and add 
        // them to the DataTable
        for (int i = 0; i < NumOfRows; i++)
        {

            table.Rows.Add( 

                new Object[] {

                    i,

                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord(),
                    getRandomWord()

                } 

            );
        }

    }



    private static string getRandomWord()
    {

        char c0 = (char)rand.Next(65, 90);
        char c1 = (char)rand.Next(65, 90);
        char c2 = (char)rand.Next(65, 90);
        char c3 = (char)rand.Next(65, 90);
        char c4 = (char)rand.Next(65, 90);

        return "" + c0 + c1 + c2 + c3 + c4;
    }

    private static void printTable()
    {
        foreach (DataRow row in dataSet.Tables[0].Rows)
        {
            Console.WriteLine( row["id"] + "--" + row["5-Letter Word A"] + " - " + row["5-Letter Word Z"] );
        }
    }


}

I haven't really looked at your parallelism yet, but there are a couple of things. 我还没有真正看过你的并行性,但有几件事情。

First, change "ParsedFiles++;" 首先,改变“ParsedFiles ++;” to "Interlocked.Increment( ref ParsedFiles);", or by locking around it. to“Interlocked.Increment(ref ParsedFiles);”,或通过锁定它。

Secondly, instead of the complicated event-driven parallelism, I would recommend using a Pipeline Pattern, which is quite suited to this. 其次,我建议使用非常适合这种情况的管道模式,而不是复杂的事件驱动的并行性。

Use a concurrent queue (or blocking collection) from the concurrent collections to hold the stages. 使用并发集合中的并发队列(或阻塞集合)来保存阶段。

First stage will hold the list of files to process. 第一阶段将保存要处理的文件列表。

Worker tasks will dequeue a file from that work list, parse it, then add it to the second stage. 工作人员任务将从该工作列表中取出文件,解析它,然后将其添加到第二阶段。

In the second stage, a worker task will take the items from the second stage queue (the just completed blocks of datatable) and upload them to the database the moment they are ready to upload. 在第二阶段,工作人员任务将从第二阶段队列(刚刚完成的数据表块)中获取项目,并在准备上载时将其上载到数据库。


Edit: 编辑:

I wrote a Pipelined version of code which should help you on your way: 我写了一个Pipelined版本的代码,可以帮助你:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections.Concurrent;
using System.Threading.Tasks;
using System.IO;
using System.Data;

namespace dataTableTesting2
{
    class Program
    {          
        private static const int BufferSize = 20; //Each buffer can only contain this many elements at a time
                                                  //This limits the total amount of memory 

        private static const int MaxBlockSize = 100;

        private static BlockingCollection<string> buffer1 = new BlockingCollection<string>(BufferSize);

        private static BlockingCollection<string[]> buffer2 = new BlockingCollection<string[]>(BufferSize);

        private static BlockingCollection<Object[][]> buffer3 = new BlockingCollection<Object[][]>(BufferSize);

        /// <summary>
        /// Start Pipelines and wait for them to finish.
        /// </summary>
        static void Main(string[] args)
        {
            TaskFactory f = new TaskFactory(TaskCreationOptions.LongRunning, TaskContinuationOptions.None);

            Task stage0 = f.StartNew(() => PopulateFilesList(buffer1));
            Task stage1 = f.StartNew(() => ReadFiles(buffer1, buffer2));
            Task stage2 = f.StartNew(() => ParseStringBlocks(buffer2, buffer3));
            Task stage3 = f.StartNew(() => UploadBlocks(buffer3) );

            Task.WaitAll(stage0, stage1, stage2, stage3);

            /*
            // Note for more workers on particular stages you can make more tasks for each stage, like the following
            //    which populates the file list in 1 task, reads the files into string[] blocks in 1 task,
            //    then parses the string[] blocks in 4 concurrent tasks
            //    and lastly uploads the info in 2 tasks

            TaskFactory f = new TaskFactory(TaskCreationOptions.LongRunning, TaskContinuationOptions.None);

            Task stage0 = f.StartNew(() => PopulateFilesList(buffer1));
            Task stage1 = f.StartNew(() => ReadFiles(buffer1, buffer2));

            Task stage2a = f.StartNew(() => ParseStringBlocks(buffer2, buffer3));
            Task stage2b = f.StartNew(() => ParseStringBlocks(buffer2, buffer3));
            Task stage2c = f.StartNew(() => ParseStringBlocks(buffer2, buffer3));
            Task stage2d = f.StartNew(() => ParseStringBlocks(buffer2, buffer3));

            Task stage3a = f.StartNew(() => UploadBlocks(buffer3) );
            Task stage3b = f.StartNew(() => UploadBlocks(buffer3) );

            Task.WaitAll(stage0, stage1, stage2a, stage2b, stage2c, stage2d, stage3a, stage3b);

            */
        }

        /// <summary>
        /// Adds the filenames to process into the first pipeline
        /// </summary>
        /// <param name="output"></param>
        private static void PopulateFilesList( BlockingCollection<string> output )
        {
            try
            {
                buffer1.Add("file1.txt");
                buffer1.Add("file2.txt");
                //...
                buffer1.Add("lastFile.txt");
            }
            finally
            {
                output.CompleteAdding();
            }
        }

        /// <summary>
        /// Takes filnames out of the first pipeline, reads them into string[] blocks, and puts them in the second pipeline
        /// </summary>
        private static void ReadFiles( BlockingCollection<string> input, BlockingCollection<string[]> output)
        {
            try
            {
                foreach (string file in input.GetConsumingEnumerable())
                {
                    List<string> list = new List<string>(MaxBlockSize);

                    using (StreamReader sr = new StreamReader(file))
                    {
                        int countLines = 0;

                        while (!sr.EndOfStream)
                        {
                            list.Add( sr.ReadLine() );
                            countLines++;

                            if (countLines > MaxBlockSize)
                            {
                                output.Add(list.ToArray());
                                countLines = 0;
                                list = new List<string>(MaxBlockSize);
                            }
                        }

                        if (list.Count > 0)
                        {
                            output.Add(list.ToArray());
                        }
                    }
                }
            }

            finally
            {
                output.CompleteAdding();
            }
        }

        /// <summary>
        /// Takes string[] blocks from the second pipeline, for each line, splits them by tabs, and parses
        /// the data, storing each line as an object array into the third pipline.
        /// </summary>
        private static void ParseStringBlocks( BlockingCollection<string[]> input, BlockingCollection< Object[][] > output)
        {
            try
            {
                List<Object[]> result = new List<object[]>(MaxBlockSize);

                foreach (string[] block in input.GetConsumingEnumerable())
                {
                    foreach (string line in block)
                    {
                        string[] splitLine = line.Split('\t'); //split line on tab

                        string cityName = splitLine[0];
                        int cityPop = Int32.Parse( splitLine[1] );
                        int cityElevation = Int32.Parse(splitLine[2]);
                        //...

                        result.Add(new Object[] { cityName, cityPop, cityElevation });
                    }

                    output.Add( result.ToArray() );
                }
            }

            finally
            {
                output.CompleteAdding();
            }
        }

        /// <summary>
        /// Takes the data blocks from the third pipeline, and uploads each row to SQL Database
        /// </summary>
        private static void UploadBlocks(BlockingCollection<Object[][]> input)
        {
            /*
             * At this point 'block' is an array of object arrays.
             * 
             * The block contains MaxBlockSize number of cities.
             * 
             * There is one object array for each city.
             * 
             * The object array for the city is in the pre-defined order from pipeline stage2
             * 
             * You could do a couple of things at this point:
             * 
             * 1. declare and initialize a DataTable with the correct column types
             *    then, do the  dataTable.Rows.Add( rowValues )
             *    then, use a Bulk Copy Operation to upload the dataTable to SQL
             *    http://msdn.microsoft.com/en-us/library/7ek5da1a
             * 
             * 2. Manually perform the sql commands/transactions similar to what 
             *    Kevin recommends in this suggestion:
             *    http://stackoverflow.com/questions/1024123/sql-insert-one-row-or-multiple-rows-data/1024195#1024195
             * 
             * I've demonstrated the first approach with this code.
             * 
             * */


            DataTable dataTable = new DataTable();

            //set up columns of dataTable here.

            foreach (Object[][] block in input.GetConsumingEnumerable())
            {
                foreach (Object[] rowValues in block)
                {

                    dataTable.Rows.Add(rowValues);
                }

                //do bulkCopy to upload table containing MaxBlockSize number of cities right here.

                dataTable.Rows.Clear(); //Remove the rows when you are done uploading, but not the dataTable.
            }
        }

    }
}

It breaks the work up into 4 parts which can be done by different tasks: 它将工作分为4个部分,可以通过不同的任务完成:

  1. make a list of files to process 制作要处理的文件列表

  2. take files from that list and read them into string[]'s 从该列表中获取文件并将其读入string []

  3. take the string[]'s from previous part and parse them creating object[]'s containing the values for each row of the table 从前一部分获取字符串[]并解析它们,创建包含表格每行值的object []

  4. upload the rows to database 将行上传到数据库

It is also easy to assign more than 1 task to each phase, allowing multiple workers to execute the same pipeline stage if desired. 为每个阶段分配多个任务也很容易,如果需要,允许多个工作人员执行相同的管道阶段。

(I doubt that having more than 1 task reading from file would be useful, unless you are using a solid state drive, since jumping around in memory is quite slow). (我怀疑从文件中读取多个任务是有用的,除非你使用固态驱动器,因为在内存中跳转非常慢)。

Also, you can set a limit to the amount of data in memory through the execution of the program. 此外,您可以通过执行程序设置内存中数据量的限制。

Each buffer is a BlockingCollection initialized with a max size, which means that if the buffer is full, and another task tries to add another element it will block that task. 每个缓冲区都是一个用最大大小初始化的BlockingCollection,这意味着如果缓冲区已满,而另一个任务尝试添加另一个元素,它将阻止该任务。

Fortunately, the Task Parallel Library is smart, and if a task is blocked it will schedule a different task that isn't blocked, and check later to see if the first task has stopped being blocked. 幸运的是,任务并行库是智能的,如果任务被阻止,它将安排一个未被阻止的不同任务,并稍后检查以查看第一个任务是否已停止被阻止。

At present each buffer can only hold 20 items, and each item is only 100 large, meaning that: 目前每个缓冲区只能容纳20个项目,每个项目只有100个大项,这意味着:

  • buffer1 will contain up to 20 filenames at any time. buffer1将随时包含多达20个文件名。

  • buffer2 will contain up to 20 blocks of strings (composed of 100 lines) from those files, at any time. buffer2将随时包含来自这些文件的最多20个字符串块(由100行组成)。

  • buffer3 will contain up to 20 items of blocks of data (the object values for 100 cities) at any time. buffer3将随时包含最多20项数据块(100个城市的对象值)。

So this would take enough memory to hold 20 filenames, 2000 lines from the files, and 2000 city informations. 因此,这需要足够的内存来容纳20个文件名,2000个文件行和2000个城市信息。 (With a bit extra for local variables and such). (对于局部变量等有一点额外的)。

You will likely want to increase BufferSize and MaxBlockSize for efficiency, although as is, this should work. 您可能希望增加BufferSize和MaxBlockSize以提高效率,尽管如此,这应该可行。

Note, I haven't tested this, as I didn't have any input files, so there could be some bugs. 注意,我没有测试过,因为我没有任何输入文件,所以可能会有一些错误。

Whilst I agree with some of the other comments and answers have you tried doing a: 虽然我同意其他一些评论和答案你尝试过:

cityTable.Rows.BeginEdit() 

before the first item is added to city table. 在第一个项目添加到城市表之前。

Then calling a: 然后打电话给:

cityTable.Rows.EndEdit()

in the FileParased event handler. 在FileParased事件处理程序中。

If you're looking for raw performance, wouldn't something like this be the best option? 如果你正在寻找原始性能,这样的东西不是最好的选择吗? It completely bypasses the datatable code, which seems to be an unnecessary step. 它完全绕过了数据表代码,这似乎是一个不必要的步骤。

void BulkInsertFile(string fileName, string tableName)
    {
        FileInfo info = new FileInfo(fileName);
        string name = info.Name;
        string shareDirectory = ""; //the path of the share: \\servername\shareName\
        string serverDirectory = ""; //the local path of the share on the server: C:\shareName\

        File.Copy(fileName, shareDirectory + name);
        // or you could call your method to parse the file and write it to the share directory.

        using (SqlConnection cnn = new SqlConnection("connectionString"))
        {
            cnn.Open();
            using (SqlCommand cmd = cnn.CreateCommand())
            {
                cmd.CommandText = string.Format("bulk insert {0} from '{1}' with (fieldterminator = ',', rowterminator = '\n')", tableName, serverDirectory + name);

                try
                {
                    cmd.ExecuteScalar();
                }
                catch (SqlException ex)
                {
                    MessageBox.Show(ex.Message);
                }
            }
        }
    }

Here is some information on the bulk insert command. 以下是有关bulk insert命令的一些信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM