简体   繁体   English

从文本文件加载数据并将其存储到数据库的最快方法

[英]Fastest way to load data from text file then store it into database

I have problem. 我有问题。

I'm developing a project but I'm stuck in this part: 我正在开发一个项目,但我陷入了这一部分:

I want to load a data from text file and store it into database access the things is the data inside each text file about 12.000 lines of data and each text file it takes about 10 minute to process it.. 我想从文本文件加载数据并将其存储到数据库访问中,这是每个文本文件中的数据约12.000行,每个文本文件处理约10分钟。

NOTE : before store the data, I separate each line of data from text file and put it into string then I check whether the data is already inside database or not. 注意:在存储数据之前,我将数据的每一行与文本文件分开,并将其放入字符串中,然后检查数据是否已在数据库中。 if inside the database I update it. 如果在数据库中,我会对其进行更新。 If not then I use insert statement.. 如果没有,那么我使用插入语句。

I'm using C# to develop this program? 我正在使用C#开发此程序? is there any fastest way to load and store this data? 有没有最快的方式来加载和存储此数据?

UPDATED: 更新:

This is my code I hope it will help to understand my problems: 这是我的代码,希望对理解我的问题有所帮助:

    using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO;
using System.Collections;
using System.Data.OleDb;

namespace DAF
{
    public partial class FrontForm : Form
    {
        public Boolean status;

        public FrontForm()
        {
            InitializeComponent();

            //define location of the database
            string connection = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Users\PC\Desktop\Graduation Project\Research\DAF\Data\DAFLogisticDepartment.mdb";

            //define location of the text file data
            DirectoryInfo di = new DirectoryInfo(@"C:\Users\PC\Desktop\Graduation Project\Research\DAF\Data\");
            FileInfo[] fiarr = di.GetFiles("*.txt");


            //define connection to database
            OleDbConnection con = new OleDbConnection(connection);
            String query;
            OleDbDataReader rdr = null;

            con.Open();
            //get all table from database
            OleDbCommand cmd = con.CreateCommand();
            DataTable dt = con.GetSchema("tables");
            DataRow[] dttable = dt.Select();
            con.Close();

            //read each new textfile inside the folder
            foreach (FileInfo fri in fiarr)
            {
                StreamReader sr = new StreamReader(fri.FullName, System.Text.Encoding.Default);
                String line;
                String tabledbs, dbsName;

                while ((line = sr.ReadLine()) != null)
                {
                    String VRSD, locationID, truckID, yearIn, yearOut, weekIn, weekOut, dayIn, dayOut, timeIn, timeOut, route;
                    int plantID;

                    //process each line of data and put into each variable
                    VRSD = line.Substring(0, 4).Trim();
                    plantID = Convert.ToInt32(line.Substring(4, 1).Trim());
                    locationID = line.Substring(5, 4).Trim();
                    truckID = line.Substring(24, 5).Trim();
                    yearIn = line.Substring(32, 4).Trim();
                    weekIn = line.Substring(36, 2).Trim();
                    dayIn = line.Substring(38, 1).Trim();
                    timeIn = line.Substring(39, 8).Trim();
                    yearOut = line.Substring(47, 4).Trim();
                    weekOut = line.Substring(51, 2).Trim();
                    dayOut = line.Substring(53, 1).Trim();
                    timeOut = line.Substring(54, 8).Trim();
                    route = line.Substring(64, 2).Trim();

                    //make database name
                    dbsName = plantID + locationID;

                    con.Open();
                    //check if the table exist in database
                    for (int i = 0; i < dttable.Length - 9; i++)
                    {
                        tabledbs = dttable[i]["TABLE_NAME"].ToString();
                        ArrayList indexlist = new ArrayList();

                        if (tabledbs == dbsName)
                        {
                            //if the table exist, status = true
                            status = true;
                            break;
                        }
                    }
                    con.Close();

                    con.Open();

                    if (status == true)
                    {

                        try
                        {        
                            //if the data not in the system, insert statement
                            query = @"insert into " + plantID + locationID + " values('" + VRSD.ToString() + "'," + plantID + ",'" + locationID + "','" + truckID + "','" + yearIn + "','" + weekIn + "','" + dayIn + "','" + timeIn + "','" + yearOut + "','" + weekOut + "','" + dayOut + "','" + timeOut + "')";
                            cmd = new OleDbCommand(query, con);
                            rdr = cmd.ExecuteReader();
                            con.Close();
                        }
                        catch
                        {
                            //if the data in the system, update statement
                            query = @"update " + dbsName + " set YearIn='" + yearIn + "', YearOut='" + yearOut + "', WeekIn='" + weekIn + "', WeekOut='" + weekOut + "', DayIn='" + dayIn + "', DayOut='" + dayOut + "', TimeIn='" + timeIn + "', TimeOut='" + timeOut + "' where LocationID='" + locationID + "' and PlantID=" + plantID + "";
                            cmd = new OleDbCommand(query, con);
                            rdr = cmd.ExecuteReader();
                            con.Close();
                        }

                    }
                    else
                    {
                        //create new table
                        string attribute = "VRSD String,PlantID Integer, LocationID String, TruckID String," +
                                            "YearIn String, WeekIn String, DayIn String, TimeIn String," +
                                            "YearOut String, WeekOut String, DayOut String, TimeOut String";

                        query = @"CREATE TABLE " + plantID + locationID + "(" + attribute + ")";
                        cmd = new OleDbCommand(query, con);
                        cmd.ExecuteNonQuery();

                        //insert the data
                        query = @"insert into " + plantID + locationID + " values('" + VRSD.ToString() + "'," + plantID + ",'" + locationID + "','" + truckID + "','" + yearIn + "','" + weekIn + "','" + dayIn + "','" + timeIn + "','" + yearOut + "','" + weekOut + "','" + dayOut + "','" + timeOut + "')";
                        cmd = new OleDbCommand(query, con);
                        rdr = cmd.ExecuteReader();
                        con.Close();
                    }

                    status = false;
                }
                sr.Close();

                //after the text file load into database, the text file moved to history folder
                MessageBox.Show(fri.FullName.ToString(), "File Manager", MessageBoxButtons.OK);
                fri.MoveTo(@"C:\Users\PC\Desktop\Graduation Project\Research\DAF\Data\History\" + fri.Name.ToString() + ".txt");
            }
        }

        private void button2_Click(object sender, EventArgs e)
        {
            StandardReport sr = new StandardReport();
            sr.Show();
        }

        private void FrontForm_Load(object sender, EventArgs e)
        {

        }
    }
}

The big time killer here will be the sheer number of database connections you are using - try building an in-memory list of commands (which will take almost no time per object compared with reading the data from the file), and once you've built your list, execute them all over a single connection. 这里的主要杀手er是您正在使用的大量数据库连接-尝试构建内存中的命令列表(与从文件中读取数据相比,每个对象几乎不需要时间),并且一旦完成建立清单,并在一个连接上执行它们。 It takes time to open each connection and you're doing that far more times than needed. 打开每个连接需要花费时间,而您做的次数远远超过了需要。 Edit - actually noticed you are opening/closing 2 connections per line per file! 编辑-实际上注意到您正在打开/关闭每个文件每行2个连接!


Currently (pseudo code for clarity): 当前(为清楚起见,使用伪代码):

For each file (x6)

   Load file from stream

   For each line in file (x12k)

     Read data from line

     Open database connection (happens 72k times)
     Check whether table exists
     Close connection

     Open connection (x72k)
     Try to insert record
     If inserting fails, update existing record
     Close connection

     Next line

   Close filestream

Next file

Suggestion: (and strongly suggest you think about the implications of adding tables dynamically, it's not normally a good solution, but if it's imposed on you maybe you have no choice) 建议:(强烈建议您考虑动态添加表的含义,通常这不是一个好的解决方案,但是如果强加给您,也许您别无选择)

Create an in-memory list of commands 
  (or list of custom objects with property for each command type, create 
  table,insert,update)    

For each file (x6)

   Load file from stream

   For each line in file (x12k)

      Read data from line (all happens 72k times, but no external connections per line)

      Write your create table command
      Write your insert command
      Write your update command
      Add to relevent command lists

   Next Line

   Close filestream

Next File

Open database connection (x1)

For each command in your list
   Apply suitable logic as to whether command needs to execute
   Execute command if applicable
Next command

Close database connection

Why don't you try creating and using an SSIS package? 您为什么不尝试创建和使用SSIS包? It's very good at this sort of thing, has excellent tooling and quite simple to use from code 在这类事情上非常出色,具有出色的工具,并且从代码中使用起来非常简单

http://msdn.microsoft.com/en-us/library/ms141026.aspx http://msdn.microsoft.com/en-us/library/ms141026.aspx

http://blogs.msdn.com/b/michen/archive/2007/03/22/running-ssis-package-programmatically.aspx http://blogs.msdn.com/b/michen/archive/2007/03/22/running-ssis-package-programmatically.aspx

You can use a query to insert unmatched records using the Jet driver for text. 您可以使用Jet驱动程序查询文本来插入不匹配的记录。

SELECT a.* INTO NewTable FROM 
(SELECT * From [Text;DSN=Import Link Specification;FMT=Delimited;HDR=NO;IMEX=2;CharacterSet=850;DATABASE=C:\Docs].[Import.txt]) As A
LEFT JOIN OldTable ON a.Key=OldTable.Key
WHERE a.Key Is Null

EDIT 编辑

I wonder why you do not have a main table containing all plants and locations. 我想知道为什么您没有包含所有工厂和位置的主表。 You could then insert all the files into a temp table and either append or update from temp accordingly. 然后,您可以将所有文件插入到临时表中,并相应地从temp追加或更新。

foreach (FileInfo fri in fiarr)
    {
    string s = "[Text;DSN=Test Spec;"
         + "FMT=Fixed;HDR=Yes;IMEX=2;CharacterSet=850;DATABASE=" 
         + fri.DirectoryName + "].["
         + fri.Name + "]";

    query = "INSERT INTO Temp SELECT * FROM " + s;

    cmd.ExecuteNonQuery();
    }

You seem to be using a fixed length format, so DSN=Test Spec is an Access specification create by exporting the file in fixed-width format and then saving the specification using the Advanced button. 您似乎正在使用固定长度格式,因此DSN=Test Spec是通过以固定宽度格式导出文件,然后使用“高级”按钮保存该规范而创建的Access规范。

One problem here might be that you are inserting each record line-by-line with SINGLE SQL EXECUTE statements. 这里的一个问题可能是您正在使用SINGLE SQL EXECUTE语句逐行插入每个记录。

Another solution would be: 另一个解决方案是:

  1. Read the text file into a string buffer (20.000 lines) 将文本文件读入字符串缓冲区(20.000行)
  2. Create a DataTable object 创建一个DataTable对象
  3. In a loop, insert every line into the DataTable object. 在循环中,将每一行插入DataTable对象。
  4. Finally, with a DataAdapter, write the DataTable back into the database. 最后,使用DataAdapter将DataTable写回到数据库中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM