简体   繁体   中英

how to read multiple csv and merge

i have 39 csv files which have a lot of memory size. I want to load this file by Java and set as one variable. Below paragraph is my coding which works for small size file, but doesn't work for large size file. Size of file is usually around 100mb to 800mb. I want to load 39 file in directory and put them into one 2d array.

public static String readCSV(File csvFile) {
    BufferedReader bufferedReader = null;
    StringBuffer stringBuffer = new StringBuffer();

    try {
        bufferedReader = new BufferedReader(new FileReader(csvFile));
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }

    try {
        String temp = null;
        while((temp = bufferedReader.readLine()) != null) {
            stringBuffer.append(temp+","); // temp 에 저장되어있는 한 줄을 더한다.
        }

        System.out.println(stringBuffer);
    } catch (IOException e) {
        e.printStackTrace();
    }

    // -10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0,,,,,,,,,,1,2,3,4,5,6,7,8,9,10, 반환
    return stringBuffer.toString();
}

public static String[] parse(String str) {
    String[] strArr = str.split(","); // 쉼표가 1개인 것을 기준으로 나누어서 배열에 저장

    return strArr; 
}

public static void main(String[] args) throws IOException {

    //mergeCsvFiles("sample", 4, "D:\\sample_folder\\" + "merge_file" + ".csv");


    String str = readCSV(new File("D:/sample_folder/sample1.csv"));
    String[] strArr = parse(str); // String 배열에 차곡차곡 담겨서 나온다.
    int varNumber = 45;
    int rowNumber = strArr.length/varNumber;

    String[][] Array2D = new String[varNumber][rowNumber];
    for(int j=0;j<varNumber;j++)
    {
        for(int i=0; i<rowNumber;i++)   
            {
                String k = strArr[i*varNumber+j];
                        Array2D[j][i]= k;
        }
    }                       //2D array 배열을 만들기      

    //String[][] naArray2D=removeNA(Array2D,rowNumber,varNumber); //NA 포함한 행 지우기





//      /*  제대로 제거 됐는지 확인하는 코드
    for(int i=0;i<varNumber;i++){
        for(int j=0;j<16;j++){
                            System.out.println(Array2D[i][j]);
        }
                        System.out.println("**********************NA제거&2차원 배열**********************");
    }           
//      */

    }
}

With the file sizes you are mentioning, you either are going to likely run out of memory in the JVM .

This is probably why your largest file of 800 some MB isn't loading into memory. Not only are you loading that 800MB into memory, but you are also adding the overhead of the array s that you are using. In other words, you're using 1600MB + all of the extra overhead cost of each array, which becomes sizeable .

My bet is that you are exceeding memory limits under the assumption that file format is perfect in both cases. While I cannot confirm as I do not know your JVM, your memory consumption, nor have the required assets to figure any of this out, it is up to you to decide whether or not that is the case.

Also, I don't know - maybe I'm reading your code right, but it doesn't seem like it's going to do what I think you want it to do. Maybe I'm wrong, I don't know exactly what you're trying to do.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM