简体   繁体   中英

Split huge text files using java to Read them

I am working on parsing a log file of size more than 2Gb. The requirement is to print some predefined words along with time stamp to a text/csv file. I have written the below code and when using small piece of log file its working fine but with 2GB of actual input log file I am getting out of memory error. please help me in resolving this.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.sql.Time;
import java.sql.Timestamp;
import java.text.SimpleDateFormat;
import java.util.Date;


    public class MyFileParser{
        private static final String COMMA_STR = ",";
        private static final String NEW_LINE_STR = "\n";


        public static void main(String[] args)  throws IOException{

            String searchString = "" ;
            String line = null;
            boolean searchFlag = false;
            StringBuffer sbr = new StringBuffer();

            FileReader reader = new FileReader("C:\\Users\\Kiran\\Desktop\\mylogs\\File.txt");
            FileWriter writter = new FileWriter("output.csv");
            BufferedReader br = new BufferedReader(reader);


            while( (line = br.readLine())  != null){

                if(line.contains("prompf1") ){
                    searchString= "prompf1";
                    searchFlag = true;
                }

                else if (line.contains("prompf9")){
                    searchString = "prompf9";
                    searchFlag = true;
                }
                    if(searchFlag){
                        String timeStamp = "";
                        int count = 0;
                    char[] charArray =  line.toCharArray();
                    for(int i=0 ; i <= charArray.length ; i++){
            // to remove [] at the begining and ending of time stamp in the file            
                        if(charArray[i] == '[' || charArray[i] == ']'){
                            count ++ ;
                        }
                               else
                        timeStamp= timeStamp+  charArray[i];
                        if(count == 2){
                            break ;
                        }

                    }
                    SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                    Date date = formatter.parse(timeStamp);
                    searchString = date + COMMA_STR+ searchString; 
                    sbr.append(searchString);
                    sbr.append(NEW_LINE_STR);
                    System.out.println(searchString);

                }

            }

            writter.write(sbr.toString());
            writter.flush();
            writter.close();

        }


    }

You are reading all the lines into a StringBuffer and then writing them out in a single operation. That's unnecessary and will cause memory problems if there are many such lines.

What you should do is write each line as soon as you have it, inside the reading loop. Then you won't need to buffer anything and the memory consumption should drop drastically.

Also do you actually have dateTime = timeStamp+ charArray[i]; in your code? Shouldn't that be timeStamp = timeStamp+ charArray[i]; ? Anyway, it would be more efficient to find the [ and ] with String#indexOf() and get the date string with String#substring() .

And there is no need to create a new SimpleDateFormat for each line - create it before the reading loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM