如何使用C＃更快地從Oracle向Elasticsearch表中插入400萬條記錄？

Question

我有以下用C＃編寫的代碼，但據此，將數據從Oracle數據庫遷移到Elasticsearch需要4-5天。 我是以100個批次插入記錄。還有其他方式可以更快地移動400萬條記錄（如果可能的話，可能在不到一天的時間內）嗎？

   public static void Selection()
        {
            for(int i = 1; i < 4000000; i += 1000)
            {
                for(int j = i; j < (i+1000); j += 100)
                {
                    OracleCommand cmd = new OracleCommand(BuildQuery(j), 
                                                     oracle_connection);
                    OracleDataReader reader = cmd.ExecuteReader();
                    List<Record> list=CreateRecordList(reader);
                    insert(list);
                }
            }
        }

   private static List<Record> CreateRecordList(OracleDataReader reader)
        {
            List<Record> l = new List<Record>();
            string[] str = new string[7];
            try
            {
                while (reader.Read())
                {
                    for (int i = 0; i < 7; i++)
                    {
                        str[i] = reader[i].ToString();
                    }

                    Record r = new Record(str[0], str[1], str[2], str[3],                              
                                str[4], str[5], str[6]);
                    l.Add(r);
                }
            }
            catch (Exception er)
            {
                string msg = er.Message;
            }
            return l;
        }

   private static string BuildQuery(int from)
        {
            int to = from + change - 1;
            StringBuilder builder = new StringBuilder();
            builder.AppendLine(@"select * from");
            builder.AppendLine("(");
            builder.AppendLine("select FIELD_1, FIELD_2, 
            FIELD_3, FIELD_4, FIELD_5, FIELD_6, 
            FIELD_7, ");
            builder.Append(" row_number() over(order by FIELD_1) 
             rn");
            builder.AppendLine("   from tablename");
            builder.AppendLine(")");
            builder.AppendLine(string.Format("where rn between {0} and {1}", 
            from, to));
            builder.AppendLine("order by rn");
            return builder.ToString();
        }

   public static void insert(List<Record> l)
        {
            try
            {
                foreach(Record r in l)
                    client.Index<Record>(r, "index", "type");
            }
            catch (Exception er)
            {
                string msg = er.Message;
            }
        }

Answer 1

ROW_NUMBER()函數會對性能產生負面影響，並且您運行了數千次。 您已經在使用OracleDataReader - 它不會同時將所有四百萬行拉到您的計算機上，它基本上只是一次或多次地將它們流式傳輸。

這必須在幾分鍾或幾小時內完成，而不是幾天 - 我們有幾個進程以類似的方式在Sybase和SQL服務器之間移動數百萬條記錄，並且只需不到五分鍾。

也許給這個鏡頭：

OracleCommand cmd = new OracleCommand("SELECT ... FROM TableName", oracle_connection);
int batchSize = 500;    
using (OracleDataReader reader = cmd.ExecuteReader())
{
    List<Record> l = new List<Record>(batchSize);
    string[] str = new string[7];
    int currentRow = 0;

    while (reader.Read())
    {
        for (int i = 0; i < 7; i++)
        {
            str[i] = reader[i].ToString();
        }

        l.Add(new Record(str[0], str[1], str[2], str[3], str[4], str[5], str[6]));

        // Commit every time batchSize records have been read
        if (++currentRow == batchSize)
        {
            Commit(l);
            l.Clear();
            currentRow = 0;
        }
    }

    // commit remaining records
    Commit(l);
}

這可能是Commit樣子：

public void Commit(IEnumerable<Record> records)
{
    // TODO: Use ES's BULK features, I don't know the exact syntax

    client.IndexMany<Record>(records, "index", "type");
    // client.Bulk(b => b.IndexMany(records))... something like this
}

Answer 2

但是你沒有批量插入100個
最后，您一次插入一個
（甚至可能不是插入一個的正確代碼）

foreach(Record r in l)
  client.Index<Record>(r, "index", "type");

如果插入一次只有一行，那么所有讀取的girations都不會做任何事情
您只是在獲得下一批時引入滯后
讀取（幾乎）總是比寫入更快

OracleCommand cmd = new OracleCommand(BuildQuery(all), oracle_connection);
OracleDataReader reader = cmd.ExecuteReader();
while (reader.Read())
{
   client.Index<Record>(new Record(reader.GetSting(0),   
                        reader.GetSting(1), reader.GetSting(2), reader.GetSting(3),    
                        reader.GetSting(4), reader.GetSting(5), reader.GetSting(6),  
                        "index", "type");
}
reader.Close();

如果要並行讀寫，可以使用BlockingCollection
但是使用最大大小來讀取並不會在寫入之前走得太遠

如何使用C＃更快地從Oracle向Elasticsearch表中插入400萬條記錄？

問題描述

2 個解決方案

解決方案1
4 已采納 2015-06-24 14:15:21

解決方案2
3 2015-06-24 14:25:40

如何使用C＃更快地從Oracle向Elasticsearch表中插入400萬條記錄？

問題描述

2 個解決方案

解決方案1 4 已采納 2015-06-24 14:15:21

解決方案2 3 2015-06-24 14:25:40

解決方案1
4 已采納 2015-06-24 14:15:21

解決方案2
3 2015-06-24 14:25:40