繁体   English   中英

处理海量数据并将其写入文件的高效,最快方法

[英]Efficient and fastest way for processing and writing huge data to a file

我正在尝试将以下数据复制一百万次,并希望写入文件。

row1,Test,2.0,1305033.0,3.0,sdfgfsg,2452345,sfgfsdg,asdfgsdfg,Gasdfgfsdgh,sdgh,sdhd sdgh,sdgh,sdgh,,sdhg,,sdgh,,,,,,,sdgh,,,,,,,,,05/12/1954,,,,,,sdghdgsh,sdfhgd,,12/25/1981,,,,12/25/1981,,,,,,,,,,,,,sdgh, dsghgh; sdgh,,,,,1.0,sdfsdf,sfgggf,34f

每次我想将第一列更新为没有记录时,因此第二行将是

row2,Test,2.0,1305033.0,3.0,sdfgfsg,2452345,sfgfsdg,asdfgsdfg,Gasdfgfsdgh,sdgh,sdhd sdgh,sdgh,sdgh,,sdhg,,sdgh,,,,,,,sdgh,,,,,,,,,05/12/1954,,,,,,sdghdgsh,sdfhgd,,12/25/1981,,,,12/25/1981,,,,,,,,,,,,,sdgh, dsghgh; sdgh,,,,,1.0,asrg,awrgtwag,245sfgsfg

我尝试使用String builder,但无法追加超过10,000行。...该程序变得非常慢。

有什么建议么...

我尝试用其他语言编写代码很好

以下是准备将数据写入文件的代码片段,在我的应用中,我将以Object []的形式获取数据

   private static void writecsv(Map<String, Object[]> data) throws Exception{
            Set<String> keyset = data.keySet();
            StringBuilder sb =new StringBuilder();;
             for(int count=0; count < OUTPUT_RECORD_COUNT;count++)
                {    

                 for (String key : keyset)
                    {
                     Object[] objArr = data.get(key);
                     for (Object obj : objArr)
                        {
                            if(obj ==null)
                                obj=BLANK;
                            sb.append(obj.toString() + COMMA);
                            sb.toString();
                        }    
                     sb.setLength(sb.length()-1);
                     sb.append(NEW_LINE);
                    }
                }
             System.out.print(  sb.toString());             
        }

如果直接在内部for循环中打印到System.out,则不必在StringBuilder中缓冲所有内容。

您想写入文件,但是在您的代码中看不到任何OutputStreamFileWriter

不要将StringBuilder用作缓冲区。

private static final int OUTPUT_RECORD_COUNT = 1000000;
private static final String BLANK = "";
private static final String COMMA = ",";
private static final String FILE_ENCODING = "Cp1252"; // Windows-ANSI


/*
 * Creates a String for the fields in array fields by joining 
 * the String values with COMMA separator.
 * First character is also a COMMA because later we will put one field
 * in front of the resulting string.
 */
private static String createLine(Object[] fields) {
  StringBuilder sb = new StringBuilder();
  for(Object field: fields) {
    sb.append(COMMA).append(field == null ? BLANK : field.toString());
  }
  return sb.toString();
}


/*
 * Added the fileName parameter.
 */
private static void writecsv(Map<String, Object[]> data, String fileName) throws Exception {
  Set<String> keyset = data.keySet();

  // Use a
  // - FileOutputStream to write bytes to file
  // - OutputStreamWriter to convert text strings to bytes according to a character encoding
  // - BufferedWriter to use an in-memory buffer for writing to the file
  // - PrintWriter for convencience methods like println()
  PrintWriter out = new PrintWriter(new BufferedWriter(
      new OutputStreamWriter(new FileOutputStream(fileName), FILE_ENCODING)));

  try {
    // It seems each key represents one original line
    for (String key : keyset) {
      // Create each line - at least the part after the "rowX" - only once.
      String line = createLine(data.get(key));

      // And you want every line OUTPUT_RECORD_COUNT times duplicates
      for(int count=0; count < OUTPUT_RECORD_COUNT;count++) {    
        // Put "rowX" in front of every line, where X is the value of count.
        out.print("row");
        out.print(count);
        out.println(line);
      }
    } finally {
      // Close the Writer even in case of an exception.
      out.flush();
      out.close();
    }
  }
}

嗯,您尝试过使用bash吗?

#!/bin/bash
var=1
while [ $var -le 1000000 ]
do
    echo "$var" >> temp
    var=$(( $var + 1 ))
done

我尝试运行该程序,大约花了几分钟完成了100万行的追加

您的代码会将所有数据保留在内存中,这就是为什么它无法扩展。 相反,您应该事先打开文件,然后逐行写入文件。

例如,请参见此答案 ,以获取有关如何执行此操作的简单示例。

还要注意,当您认真考虑编写适当的CSV时,应该考虑为此使用一个库,例如opencsv 然后,将为您处理诸如正确报价之类的事情。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM