簡體   English   中英

處理海量數據並將其寫入文件的高效,最快方法

[英]Efficient and fastest way for processing and writing huge data to a file

我正在嘗試將以下數據復制一百萬次,並希望寫入文件。

row1,Test,2.0,1305033.0,3.0,sdfgfsg,2452345,sfgfsdg,asdfgsdfg,Gasdfgfsdgh,sdgh,sdhd sdgh,sdgh,sdgh,,sdhg,,sdgh,,,,,,,sdgh,,,,,,,,,05/12/1954,,,,,,sdghdgsh,sdfhgd,,12/25/1981,,,,12/25/1981,,,,,,,,,,,,,sdgh, dsghgh; sdgh,,,,,1.0,sdfsdf,sfgggf,34f

每次我想將第一列更新為沒有記錄時,因此第二行將是

row2,Test,2.0,1305033.0,3.0,sdfgfsg,2452345,sfgfsdg,asdfgsdfg,Gasdfgfsdgh,sdgh,sdhd sdgh,sdgh,sdgh,,sdhg,,sdgh,,,,,,,sdgh,,,,,,,,,05/12/1954,,,,,,sdghdgsh,sdfhgd,,12/25/1981,,,,12/25/1981,,,,,,,,,,,,,sdgh, dsghgh; sdgh,,,,,1.0,asrg,awrgtwag,245sfgsfg

我嘗試使用String builder,但無法追加超過10,000行。...該程序變得非常慢。

有什么建議么...

我嘗試用其他語言編寫代碼很好

以下是准備將數據寫入文件的代碼片段,在我的應用中,我將以Object []的形式獲取數據

   private static void writecsv(Map<String, Object[]> data) throws Exception{
            Set<String> keyset = data.keySet();
            StringBuilder sb =new StringBuilder();;
             for(int count=0; count < OUTPUT_RECORD_COUNT;count++)
                {    

                 for (String key : keyset)
                    {
                     Object[] objArr = data.get(key);
                     for (Object obj : objArr)
                        {
                            if(obj ==null)
                                obj=BLANK;
                            sb.append(obj.toString() + COMMA);
                            sb.toString();
                        }    
                     sb.setLength(sb.length()-1);
                     sb.append(NEW_LINE);
                    }
                }
             System.out.print(  sb.toString());             
        }

如果直接在內部for循環中打印到System.out,則不必在StringBuilder中緩沖所有內容。

您想寫入文件,但是在您的代碼中看不到任何OutputStreamFileWriter

不要將StringBuilder用作緩沖區。

private static final int OUTPUT_RECORD_COUNT = 1000000;
private static final String BLANK = "";
private static final String COMMA = ",";
private static final String FILE_ENCODING = "Cp1252"; // Windows-ANSI


/*
 * Creates a String for the fields in array fields by joining 
 * the String values with COMMA separator.
 * First character is also a COMMA because later we will put one field
 * in front of the resulting string.
 */
private static String createLine(Object[] fields) {
  StringBuilder sb = new StringBuilder();
  for(Object field: fields) {
    sb.append(COMMA).append(field == null ? BLANK : field.toString());
  }
  return sb.toString();
}


/*
 * Added the fileName parameter.
 */
private static void writecsv(Map<String, Object[]> data, String fileName) throws Exception {
  Set<String> keyset = data.keySet();

  // Use a
  // - FileOutputStream to write bytes to file
  // - OutputStreamWriter to convert text strings to bytes according to a character encoding
  // - BufferedWriter to use an in-memory buffer for writing to the file
  // - PrintWriter for convencience methods like println()
  PrintWriter out = new PrintWriter(new BufferedWriter(
      new OutputStreamWriter(new FileOutputStream(fileName), FILE_ENCODING)));

  try {
    // It seems each key represents one original line
    for (String key : keyset) {
      // Create each line - at least the part after the "rowX" - only once.
      String line = createLine(data.get(key));

      // And you want every line OUTPUT_RECORD_COUNT times duplicates
      for(int count=0; count < OUTPUT_RECORD_COUNT;count++) {    
        // Put "rowX" in front of every line, where X is the value of count.
        out.print("row");
        out.print(count);
        out.println(line);
      }
    } finally {
      // Close the Writer even in case of an exception.
      out.flush();
      out.close();
    }
  }
}

嗯,您嘗試過使用bash嗎?

#!/bin/bash
var=1
while [ $var -le 1000000 ]
do
    echo "$var" >> temp
    var=$(( $var + 1 ))
done

我嘗試運行該程序,大約花了幾分鍾完成了100萬行的追加

您的代碼會將所有數據保留在內存中,這就是為什么它無法擴展。 相反,您應該事先打開文件,然后逐行寫入文件。

例如,請參見此答案 ,以獲取有關如何執行此操作的簡單示例。

還要注意,當您認真考慮編寫適當的CSV時,應該考慮為此使用一個庫,例如opencsv 然后,將為您處理諸如正確報價之類的事情。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM