简体   繁体   中英

How to detect if a string in a file has been edited or added?

In the below example, I am able to identify the overall changes. But I am not able to get the string which has been edited or added separately. Is there any algorithm/approach to detect whether a string is edited or added or deleted within a file? I have tried java File Watcher, but that only detects a file, whether the file has been edited or created or deleted or modified any content within the file or not. It does not provide the changes which has been performed within the file.

diffFiles function just checks whether a string is matching in both files or not. I have made a copy of the base file and checking the differences:

public HashMap<String, Integer> diffFiles(List<String> firstFileContent, List<String> secondFileContent) throws IOException {  
      Integer count = 0;
      final HashMap<String, Integer> diff = new HashMap<String, Integer>();
      for (final String line : firstFileContent) {
          count += 1;
          if (!secondFileContent.contains(line)) {
              diff.put(line, count);
          }
      }
      return diff;
  }

I want to individually identify the strings within the file whether it has been edited or added within the file

you may use a class called Checksum, it is used in order to check that a complete message has been received, Checksum intervene in order to ensure that is no bit lost

Here are some ways you can do that:

Checksum

It is a short representation of your data.

Code:

var content = "this is my file content"
var b = content.getBytes()

To calculate for each of your files you need to:

public static long getChecksum(byte[] bytes) {
    Checksum crc32 = new CRC32();
    crc32.update(bytes, 0, bytes.length);
    return crc32.getValue();
}

If both long are the same. They are exactly the same content.

Apache Commons Codecs

You could also use a sha256 to do that with Apache Commons Codecs:

<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>1.11</version>
</dependency>

And the validation is:

String sha = DigestUtils.sha256Hex(yourFullFileContentString);

If both string(eg: sha ) are the same. You have an identical content.

Guava Library

Google library also have the same possibility

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>20.0</version>
</dependency>

And here the code:

var sha = Hashing.sha256()
  .hashString(yourFullFileContentString, StandardCharsets.UTF_8).toString();

Which one to choose

I would choose the Checksum as it is not intent as a security hash algorithm (SHA).

With your implementation of diffFiles() , you will get all the lines that are in the first file, but are missing in the second.

It won't give you all the lines that are in the second file, but not in the first file. And it will report lines that have moved their location in the second file as 'unchanged'.

And as you noticed already, you cannot determine whether a line was added/inserted or if an existing line was just modified (fixed a typo, for example).


What you ask for is basically a Java implementation for the 'diff' tool, and StackOverflow has already a bunch of answers for that:

There might be more, and some of the answers do just suggest to use some library, while others do not go the full path to your desired solution, but all of them should give you an idea on how to proceed.

And that the links here do appear also on the right side bar is because these links are here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM