简体   繁体   English

处理海量数据并将其写入文件的高效,最快方法

[英]Efficient and fastest way for processing and writing huge data to a file

I am trying to duplicate the below data 1 million times and want to write to file. 我正在尝试将以下数据复制一百万次,并希望写入文件。

row1,Test,2.0,1305033.0,3.0,sdfgfsg,2452345,sfgfsdg,asdfgsdfg,Gasdfgfsdgh,sdgh,sdhd sdgh,sdgh,sdgh,,sdhg,,sdgh,,,,,,,sdgh,,,,,,,,,05/12/1954,,,,,,sdghdgsh,sdfhgd,,12/25/1981,,,,12/25/1981,,,,,,,,,,,,,sdgh, dsghgh; sdgh,,,,,1.0,sdfsdf,sfgggf,34f

each time I want to update the first column to no of records, so my second row will be 每次我想将第一列更新为没有记录时,因此第二行将是

row2,Test,2.0,1305033.0,3.0,sdfgfsg,2452345,sfgfsdg,asdfgsdfg,Gasdfgfsdgh,sdgh,sdhd sdgh,sdgh,sdgh,,sdhg,,sdgh,,,,,,,sdgh,,,,,,,,,05/12/1954,,,,,,sdghdgsh,sdfhgd,,12/25/1981,,,,12/25/1981,,,,,,,,,,,,,sdgh, dsghgh; sdgh,,,,,1.0,asrg,awrgtwag,245sfgsfg

I tried using String builder, I am not able to append more than 10,000 rows.... The program becomes very slow.... 我尝试使用String builder,但无法追加超过10,000行。...该程序变得非常慢。

Any suggestions... 有什么建议么...

I'm fine trying to write code in other languages 我尝试用其他语言编写代码很好

The below is the code snippet which prepares the data to write to the file and in my app I'll get data as Object[] 以下是准备将数据写入文件的代码片段,在我的应用中,我将以Object []的形式获取数据

   private static void writecsv(Map<String, Object[]> data) throws Exception{
            Set<String> keyset = data.keySet();
            StringBuilder sb =new StringBuilder();;
             for(int count=0; count < OUTPUT_RECORD_COUNT;count++)
                {    

                 for (String key : keyset)
                    {
                     Object[] objArr = data.get(key);
                     for (Object obj : objArr)
                        {
                            if(obj ==null)
                                obj=BLANK;
                            sb.append(obj.toString() + COMMA);
                            sb.toString();
                        }    
                     sb.setLength(sb.length()-1);
                     sb.append(NEW_LINE);
                    }
                }
             System.out.print(  sb.toString());             
        }

如果直接在内部for循环中打印到System.out,则不必在StringBuilder中缓冲所有内容。

You want to write to a file, but I don't see any OutputStream or FileWriter in your code. 您想写入文件,但是在您的代码中看不到任何OutputStreamFileWriter

Don't use a StringBuilder as a buffer. 不要将StringBuilder用作缓冲区。

private static final int OUTPUT_RECORD_COUNT = 1000000;
private static final String BLANK = "";
private static final String COMMA = ",";
private static final String FILE_ENCODING = "Cp1252"; // Windows-ANSI


/*
 * Creates a String for the fields in array fields by joining 
 * the String values with COMMA separator.
 * First character is also a COMMA because later we will put one field
 * in front of the resulting string.
 */
private static String createLine(Object[] fields) {
  StringBuilder sb = new StringBuilder();
  for(Object field: fields) {
    sb.append(COMMA).append(field == null ? BLANK : field.toString());
  }
  return sb.toString();
}


/*
 * Added the fileName parameter.
 */
private static void writecsv(Map<String, Object[]> data, String fileName) throws Exception {
  Set<String> keyset = data.keySet();

  // Use a
  // - FileOutputStream to write bytes to file
  // - OutputStreamWriter to convert text strings to bytes according to a character encoding
  // - BufferedWriter to use an in-memory buffer for writing to the file
  // - PrintWriter for convencience methods like println()
  PrintWriter out = new PrintWriter(new BufferedWriter(
      new OutputStreamWriter(new FileOutputStream(fileName), FILE_ENCODING)));

  try {
    // It seems each key represents one original line
    for (String key : keyset) {
      // Create each line - at least the part after the "rowX" - only once.
      String line = createLine(data.get(key));

      // And you want every line OUTPUT_RECORD_COUNT times duplicates
      for(int count=0; count < OUTPUT_RECORD_COUNT;count++) {    
        // Put "rowX" in front of every line, where X is the value of count.
        out.print("row");
        out.print(count);
        out.println(line);
      }
    } finally {
      // Close the Writer even in case of an exception.
      out.flush();
      out.close();
    }
  }
}

Ummm, have you tried using bash? 嗯,您尝试过使用bash吗?

#!/bin/bash
var=1
while [ $var -le 1000000 ]
do
    echo "$var" >> temp
    var=$(( $var + 1 ))
done

I tried to run the program and it took around couple minutes to finish appending 1 million lines 我尝试运行该程序,大约花了几分钟完成了100万行的追加

Your code is keeping all the data in memory, which is why it cannot scale. 您的代码会将所有数据保留在内存中,这就是为什么它无法扩展。 Instead, you should open the file beforehand and then write to it line by line. 相反,您应该事先打开文件,然后逐行写入文件。

See, eg, this answer for a simple example on how to do this. 例如,请参见此答案 ,以获取有关如何执行此操作的简单示例。

Also note that when you are serious about writing proper CSV, you should consider using a library for that, such as opencsv . 还要注意,当您认真考虑编写适当的CSV时,应该考虑为此使用一个库,例如opencsv Then things like proper quoting will be handled for you. 然后,将为您处理诸如正确报价之类的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM