簡體   English   中英

從文本文件加載數據並將其存儲到數據庫的最快方法

[英]Fastest way to load data from text file then store it into database

我有問題。

我正在開發一個項目,但我陷入了這一部分:

我想從文本文件加載數據並將其存儲到數據庫訪問中,這是每個文本文件中的數據約12.000行,每個文本文件處理約10分鍾。

注意:在存儲數據之前,我將數據的每一行與文本文件分開,並將其放入字符串中,然后檢查數據是否已在數據庫中。 如果在數據庫中,我會對其進行更新。 如果沒有,那么我使用插入語句。

我正在使用C#開發此程序? 有沒有最快的方式來加載和存儲此數據?

更新:

這是我的代碼,希望對理解我的問題有所幫助:

    using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO;
using System.Collections;
using System.Data.OleDb;

namespace DAF
{
    public partial class FrontForm : Form
    {
        public Boolean status;

        public FrontForm()
        {
            InitializeComponent();

            //define location of the database
            string connection = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Users\PC\Desktop\Graduation Project\Research\DAF\Data\DAFLogisticDepartment.mdb";

            //define location of the text file data
            DirectoryInfo di = new DirectoryInfo(@"C:\Users\PC\Desktop\Graduation Project\Research\DAF\Data\");
            FileInfo[] fiarr = di.GetFiles("*.txt");


            //define connection to database
            OleDbConnection con = new OleDbConnection(connection);
            String query;
            OleDbDataReader rdr = null;

            con.Open();
            //get all table from database
            OleDbCommand cmd = con.CreateCommand();
            DataTable dt = con.GetSchema("tables");
            DataRow[] dttable = dt.Select();
            con.Close();

            //read each new textfile inside the folder
            foreach (FileInfo fri in fiarr)
            {
                StreamReader sr = new StreamReader(fri.FullName, System.Text.Encoding.Default);
                String line;
                String tabledbs, dbsName;

                while ((line = sr.ReadLine()) != null)
                {
                    String VRSD, locationID, truckID, yearIn, yearOut, weekIn, weekOut, dayIn, dayOut, timeIn, timeOut, route;
                    int plantID;

                    //process each line of data and put into each variable
                    VRSD = line.Substring(0, 4).Trim();
                    plantID = Convert.ToInt32(line.Substring(4, 1).Trim());
                    locationID = line.Substring(5, 4).Trim();
                    truckID = line.Substring(24, 5).Trim();
                    yearIn = line.Substring(32, 4).Trim();
                    weekIn = line.Substring(36, 2).Trim();
                    dayIn = line.Substring(38, 1).Trim();
                    timeIn = line.Substring(39, 8).Trim();
                    yearOut = line.Substring(47, 4).Trim();
                    weekOut = line.Substring(51, 2).Trim();
                    dayOut = line.Substring(53, 1).Trim();
                    timeOut = line.Substring(54, 8).Trim();
                    route = line.Substring(64, 2).Trim();

                    //make database name
                    dbsName = plantID + locationID;

                    con.Open();
                    //check if the table exist in database
                    for (int i = 0; i < dttable.Length - 9; i++)
                    {
                        tabledbs = dttable[i]["TABLE_NAME"].ToString();
                        ArrayList indexlist = new ArrayList();

                        if (tabledbs == dbsName)
                        {
                            //if the table exist, status = true
                            status = true;
                            break;
                        }
                    }
                    con.Close();

                    con.Open();

                    if (status == true)
                    {

                        try
                        {        
                            //if the data not in the system, insert statement
                            query = @"insert into " + plantID + locationID + " values('" + VRSD.ToString() + "'," + plantID + ",'" + locationID + "','" + truckID + "','" + yearIn + "','" + weekIn + "','" + dayIn + "','" + timeIn + "','" + yearOut + "','" + weekOut + "','" + dayOut + "','" + timeOut + "')";
                            cmd = new OleDbCommand(query, con);
                            rdr = cmd.ExecuteReader();
                            con.Close();
                        }
                        catch
                        {
                            //if the data in the system, update statement
                            query = @"update " + dbsName + " set YearIn='" + yearIn + "', YearOut='" + yearOut + "', WeekIn='" + weekIn + "', WeekOut='" + weekOut + "', DayIn='" + dayIn + "', DayOut='" + dayOut + "', TimeIn='" + timeIn + "', TimeOut='" + timeOut + "' where LocationID='" + locationID + "' and PlantID=" + plantID + "";
                            cmd = new OleDbCommand(query, con);
                            rdr = cmd.ExecuteReader();
                            con.Close();
                        }

                    }
                    else
                    {
                        //create new table
                        string attribute = "VRSD String,PlantID Integer, LocationID String, TruckID String," +
                                            "YearIn String, WeekIn String, DayIn String, TimeIn String," +
                                            "YearOut String, WeekOut String, DayOut String, TimeOut String";

                        query = @"CREATE TABLE " + plantID + locationID + "(" + attribute + ")";
                        cmd = new OleDbCommand(query, con);
                        cmd.ExecuteNonQuery();

                        //insert the data
                        query = @"insert into " + plantID + locationID + " values('" + VRSD.ToString() + "'," + plantID + ",'" + locationID + "','" + truckID + "','" + yearIn + "','" + weekIn + "','" + dayIn + "','" + timeIn + "','" + yearOut + "','" + weekOut + "','" + dayOut + "','" + timeOut + "')";
                        cmd = new OleDbCommand(query, con);
                        rdr = cmd.ExecuteReader();
                        con.Close();
                    }

                    status = false;
                }
                sr.Close();

                //after the text file load into database, the text file moved to history folder
                MessageBox.Show(fri.FullName.ToString(), "File Manager", MessageBoxButtons.OK);
                fri.MoveTo(@"C:\Users\PC\Desktop\Graduation Project\Research\DAF\Data\History\" + fri.Name.ToString() + ".txt");
            }
        }

        private void button2_Click(object sender, EventArgs e)
        {
            StandardReport sr = new StandardReport();
            sr.Show();
        }

        private void FrontForm_Load(object sender, EventArgs e)
        {

        }
    }
}

這里的主要殺手er是您正在使用的大量數據庫連接-嘗試構建內存中的命令列表(與從文件中讀取數據相比,每個對象幾乎不需要時間),並且一旦完成建立清單,並在一個連接上執行它們。 打開每個連接需要花費時間,而您做的次數遠遠超過了需要。 編輯-實際上注意到您正在打開/關閉每個文件每行2個連接!


當前(為清楚起見,使用偽代碼):

For each file (x6)

   Load file from stream

   For each line in file (x12k)

     Read data from line

     Open database connection (happens 72k times)
     Check whether table exists
     Close connection

     Open connection (x72k)
     Try to insert record
     If inserting fails, update existing record
     Close connection

     Next line

   Close filestream

Next file

建議:(強烈建議您考慮動態添加表的含義,通常這不是一個好的解決方案,但是如果強加給您,也許您別無選擇)

Create an in-memory list of commands 
  (or list of custom objects with property for each command type, create 
  table,insert,update)    

For each file (x6)

   Load file from stream

   For each line in file (x12k)

      Read data from line (all happens 72k times, but no external connections per line)

      Write your create table command
      Write your insert command
      Write your update command
      Add to relevent command lists

   Next Line

   Close filestream

Next File

Open database connection (x1)

For each command in your list
   Apply suitable logic as to whether command needs to execute
   Execute command if applicable
Next command

Close database connection

您為什么不嘗試創建和使用SSIS包? 在這類事情上非常出色,具有出色的工具,並且從代碼中使用起來非常簡單

http://msdn.microsoft.com/en-us/library/ms141026.aspx

http://blogs.msdn.com/b/michen/archive/2007/03/22/running-ssis-package-programmatically.aspx

您可以使用Jet驅動程序查詢文本來插入不匹配的記錄。

SELECT a.* INTO NewTable FROM 
(SELECT * From [Text;DSN=Import Link Specification;FMT=Delimited;HDR=NO;IMEX=2;CharacterSet=850;DATABASE=C:\Docs].[Import.txt]) As A
LEFT JOIN OldTable ON a.Key=OldTable.Key
WHERE a.Key Is Null

編輯

我想知道為什么您沒有包含所有工廠和位置的主表。 然后,您可以將所有文件插入到臨時表中,並相應地從temp追加或更新。

foreach (FileInfo fri in fiarr)
    {
    string s = "[Text;DSN=Test Spec;"
         + "FMT=Fixed;HDR=Yes;IMEX=2;CharacterSet=850;DATABASE=" 
         + fri.DirectoryName + "].["
         + fri.Name + "]";

    query = "INSERT INTO Temp SELECT * FROM " + s;

    cmd.ExecuteNonQuery();
    }

您似乎正在使用固定長度格式,因此DSN=Test Spec是通過以固定寬度格式導出文件,然后使用“高級”按鈕保存該規范而創建的Access規范。

這里的一個問題可能是您正在使用SINGLE SQL EXECUTE語句逐行插入每個記錄。

另一個解決方案是:

  1. 將文本文件讀入字符串緩沖區(20.000行)
  2. 創建一個DataTable對象
  3. 在循環中,將每一行插入DataTable對象。
  4. 最后,使用DataAdapter將DataTable寫回到數據庫中。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM