简体   繁体   中英

Efficient and fastest way for processing and writing huge data to a file

I am trying to duplicate the below data 1 million times and want to write to file.

row1,Test,2.0,1305033.0,3.0,sdfgfsg,2452345,sfgfsdg,asdfgsdfg,Gasdfgfsdgh,sdgh,sdhd sdgh,sdgh,sdgh,,sdhg,,sdgh,,,,,,,sdgh,,,,,,,,,05/12/1954,,,,,,sdghdgsh,sdfhgd,,12/25/1981,,,,12/25/1981,,,,,,,,,,,,,sdgh, dsghgh; sdgh,,,,,1.0,sdfsdf,sfgggf,34f

each time I want to update the first column to no of records, so my second row will be

row2,Test,2.0,1305033.0,3.0,sdfgfsg,2452345,sfgfsdg,asdfgsdfg,Gasdfgfsdgh,sdgh,sdhd sdgh,sdgh,sdgh,,sdhg,,sdgh,,,,,,,sdgh,,,,,,,,,05/12/1954,,,,,,sdghdgsh,sdfhgd,,12/25/1981,,,,12/25/1981,,,,,,,,,,,,,sdgh, dsghgh; sdgh,,,,,1.0,asrg,awrgtwag,245sfgsfg

I tried using String builder, I am not able to append more than 10,000 rows.... The program becomes very slow....

Any suggestions...

I'm fine trying to write code in other languages

The below is the code snippet which prepares the data to write to the file and in my app I'll get data as Object[]

   private static void writecsv(Map<String, Object[]> data) throws Exception{
            Set<String> keyset = data.keySet();
            StringBuilder sb =new StringBuilder();;
             for(int count=0; count < OUTPUT_RECORD_COUNT;count++)
                {    

                 for (String key : keyset)
                    {
                     Object[] objArr = data.get(key);
                     for (Object obj : objArr)
                        {
                            if(obj ==null)
                                obj=BLANK;
                            sb.append(obj.toString() + COMMA);
                            sb.toString();
                        }    
                     sb.setLength(sb.length()-1);
                     sb.append(NEW_LINE);
                    }
                }
             System.out.print(  sb.toString());             
        }

如果直接在内部for循环中打印到System.out,则不必在StringBuilder中缓冲所有内容。

You want to write to a file, but I don't see any OutputStream or FileWriter in your code.

Don't use a StringBuilder as a buffer.

private static final int OUTPUT_RECORD_COUNT = 1000000;
private static final String BLANK = "";
private static final String COMMA = ",";
private static final String FILE_ENCODING = "Cp1252"; // Windows-ANSI


/*
 * Creates a String for the fields in array fields by joining 
 * the String values with COMMA separator.
 * First character is also a COMMA because later we will put one field
 * in front of the resulting string.
 */
private static String createLine(Object[] fields) {
  StringBuilder sb = new StringBuilder();
  for(Object field: fields) {
    sb.append(COMMA).append(field == null ? BLANK : field.toString());
  }
  return sb.toString();
}


/*
 * Added the fileName parameter.
 */
private static void writecsv(Map<String, Object[]> data, String fileName) throws Exception {
  Set<String> keyset = data.keySet();

  // Use a
  // - FileOutputStream to write bytes to file
  // - OutputStreamWriter to convert text strings to bytes according to a character encoding
  // - BufferedWriter to use an in-memory buffer for writing to the file
  // - PrintWriter for convencience methods like println()
  PrintWriter out = new PrintWriter(new BufferedWriter(
      new OutputStreamWriter(new FileOutputStream(fileName), FILE_ENCODING)));

  try {
    // It seems each key represents one original line
    for (String key : keyset) {
      // Create each line - at least the part after the "rowX" - only once.
      String line = createLine(data.get(key));

      // And you want every line OUTPUT_RECORD_COUNT times duplicates
      for(int count=0; count < OUTPUT_RECORD_COUNT;count++) {    
        // Put "rowX" in front of every line, where X is the value of count.
        out.print("row");
        out.print(count);
        out.println(line);
      }
    } finally {
      // Close the Writer even in case of an exception.
      out.flush();
      out.close();
    }
  }
}

Ummm, have you tried using bash?

#!/bin/bash
var=1
while [ $var -le 1000000 ]
do
    echo "$var" >> temp
    var=$(( $var + 1 ))
done

I tried to run the program and it took around couple minutes to finish appending 1 million lines

Your code is keeping all the data in memory, which is why it cannot scale. Instead, you should open the file beforehand and then write to it line by line.

See, eg, this answer for a simple example on how to do this.

Also note that when you are serious about writing proper CSV, you should consider using a library for that, such as opencsv . Then things like proper quoting will be handled for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM