简体   繁体   中英

Merging sorted Files using multithreading

Multithreading is new to me so sorry for mistakes.

I have written the below program which merges files with mulithreading but I am not able to figure out how to manage the last file and after one iteration how to merge the newly created files.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.ArrayList;

public class MergerSorter extends Thread {
int fileNumber = 1;

public static void main(String[] args) {
    startMergingfiles(9);
}

public MergerSorter(int fileNum) {
    fileNumber = fileNum;
}

public static void startMergingfiles(int numberOfFiles) {
    int objectcounter = 0;

    while (numberOfFiles != 1) {
        try {
            ArrayList<MergerSorter> objectList = new ArrayList<MergerSorter>();
            for (int j = 1; j <= numberOfFiles; j = j + 2) {
                if (numberOfFiles == j) {// Last Single remaining File

                } else {
                    objectList.add(new MergerSorter(j));
                    objectList.get(objectcounter).start();
                    objectList.get(objectcounter).join();
                    objectcounter++;
                }
            }
            objectcounter = 0;
            numberOfFiles = numberOfFiles / 2;

        } catch (Exception e) {
            System.out.println(e);
        }

    }
}

public void run() {

    try {
        FileReader fileReader1 = new FileReader("src/externalsort/" + Integer.toString(fileNumber));
        FileReader fileReader2 = new FileReader("src/externalsort/" + Integer.toString(fileNumber + 1));
        BufferedReader bufferedReader1 = new BufferedReader(fileReader1);
        BufferedReader bufferedReader2 = new BufferedReader(fileReader2);

        String line1 = bufferedReader1.readLine();
        String line2 = bufferedReader2.readLine();

        FileWriter tmpFile = new FileWriter("src/externalsort/" + Integer.toString(fileNumber) + "op.txt", false);
        int whichFileToRead = 0;

        boolean file_1_reader = true;
        boolean file_2_reader = true;

        while (file_1_reader || file_2_reader) {
            if (file_1_reader == false) {
                tmpFile.write(line2 + "\r\n");
                whichFileToRead = 2;
            } else if (file_2_reader == false) {
                tmpFile.write(line1 + "\r\n");
                whichFileToRead = 1;
            } else {
                String value1 = line1.substring(0, 10);
                String value2 = line2.substring(0, 10);
                int ans = value1.compareTo(value2);
                if (ans < 0) {
                    tmpFile.write(line1 + "\r\n");
                    whichFileToRead = 1;
                } else if (ans > 0) {
                    tmpFile.write(line2 + "\r\n");
                    whichFileToRead = 2;
                } else if (ans == 0) {
                    tmpFile.write(line1 + "\r\n");
                    whichFileToRead = 1;
                }
            }

            if (whichFileToRead == 1) {
                line1 = bufferedReader1.readLine();
                if (line1 == null)
                    file_1_reader = false;
            } else {
                line2 = bufferedReader2.readLine();
                if (line2 == null)
                    file_2_reader = false;

            }
        }

        tmpFile.close();
        bufferedReader1.close();
        bufferedReader2.close();
        fileReader1.close();
        fileReader2.close();

    } catch (Exception e) {
        System.out.println(e);
    }

 }
}

I am trying to merge sorted files with multithreading. Say I have 50 files and I want to merge all these individual files into one final sorted file but I want to speed up and utilize every core by multi threading but I am not able to do it. And the files are big so they can't be placed in heap/RAM so I have to read every file and keep writing.

You can do this with merge sort , but instead of lots of little sorted lists, you'll need to use lots of little sorted files. Once you have broken all of the files down into small sorted files, you can start merging them together again until you end up with a single sorted file.

Unfortunately, you likely won't be able to achieve high CPU utilisation as much of the time will be spend waiting for disk I/O to complete.

Edit: just read your response to a comment and it sounds like you are asking for help on the last step of the merge sort. The graphics in the wiki link above will also help you understand. So, assuming all of your files are sorted, here we go:

  1. Read 1 item from each file
  2. Figure out which lowest/smallest/whatever and write that line to the result file
  3. Read a new item from the file which just provided the last item
  4. Repeat steps 2 and 3 until all files have been completely read.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM