简体   繁体   中英

What is the fastest way to compare the content of two BIG text files in JAVA

I have two text files that are more than 600MB and I want to compare the content of them if they are the same (Ignoring any space at the end or the start of any line in it ie trim() each line). I am thinking of reading each line of them as a string and then trim it and compare it.

Is there is a better idea and if not what is the fastest implementation to this idea? Thanks in advance.

If you want to compare whether the files are consistent, please calculate the file md5 value to compare:

import java.io.FileInputStream;
import java.io.InputStream;
import java.math.BigInteger;
import java.security.MessageDigest;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

public class MainServer {
  public static void main(String[] args) {
    String filePath1 = "D:\\Download\\a.mp3";
    String filePath2 = "D:\\Download\\b.mp3";
    String file1_md5 = md5HashCode(filePath1);
    String file2_md5 = md5HashCode(filePath2);

    System.out.println(file1_md5);
    System.out.println(file2_md5);
    if(file1_md5.equals(file2_md5)){
        System.out.println("Two files are the same ");
    }
}

/**
 * get file md5 value
 */
 public static String md5HashCode(String filePath) {
    try {
        InputStream fis =new FileInputStream(filePath);
        MessageDigest md = MessageDigest.getInstance("MD5");
        byte[] buffer = new byte[1024];
        int length = -1;
        while ((length = fis.read(buffer, 0, 1024)) != -1) {
            md.update(buffer, 0, length);
        }
        fis.close();
        byte[] md5Bytes  = md.digest();
        BigInteger bigInt = new BigInteger(1, md5Bytes);
        return bigInt.toString(16);
     } catch (Exception e) {
        e.printStackTrace();
        return "";
     }
  }
}

If you need to read each line of the file for comparison:

    List<String> file1_lines = null;
    List<String> file2_lines = null;
    try {
        file1_lines = Files.readAllLines(Paths.get("D:/a.txt"), StandardCharsets.UTF_8);
        file2_lines = Files.readAllLines(Paths.get("D:/b.txt"), StandardCharsets.UTF_8);
    } catch (IOException e) {
        e.printStackTrace();
    }

    for (int i = 0; i < file1_lines.size(); i++) {
        String file1_line = file1_lines.get(i).trim();
        String file2_line = file2_lines.get(i).trim();
        if (file1_line.equals(file2_line)) {
            //do some
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM