简体   繁体   中英

How to monitor file modifications and know what changes were made

I'm working on a Java project where I need to monitor files in a certain directory and be notified whenever changes are made on one of the files, this can be achieved using WatchService . Furthermore, I want to know what changes were made, for example: "characters 10 to 15 where removed", "at index 13 characters 'abcd' were added"... I'm willing to take any solution even based on c language monitiring the fileSystem. I also want to avoid the diff solution to avoid storing the same file 2 times, and for the complexity of the algorithm, it takes to much time for big files. Thank you for help. :)

If you're using Linux, then the following code will detect changes in file length, you can easily extend this to update modifications.

Because you don't want to keep two files, there is no way to tell which characters were altered if either the file length is reduced (lost characters can't be found) or The file was altered somewhere in the middle

#include <stdio.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char** argv)
{
    int fd = open("test", O_RDONLY);
    int length = lseek(fd, 0, SEEK_END);
    while (1)
    {
        int new_length;
        close(fd);
        open("test", O_RDONLY);
        sleep(1);
        new_length = lseek(fd, 0, SEEK_END);
        printf("new_length = %d\n", new_length);
        if (new_length != length)
            printf ("Length changed! %d->%d\n", length, new_length);
        length=new_length;
    }
}

[EDIT]
Since the author accepts changes to the kernel for this task, the following change to vfs_write should do the trick:

#define MAX_DIFF_LENGTH 128
ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
{
    char old_content[MAX_DIFF_LENGTH+1];
    char new_content[MAX_DIFF_LENGTH+1];
    ssize_t ret;

    if (!(file->f_mode & FMODE_WRITE))
        return -EBADF;
    if (!file->f_op || (!file->f_op->write && !file->f_op->aio_write))
        return -EINVAL;
    if (unlikely(!access_ok(VERIFY_READ, buf, count)))
        return -EFAULT;

    ret = rw_verify_area(WRITE, file, pos, count);
    if (___ishay < 20)
    {
        int i;
        int length = count > MAX_DIFF_LENGTH ? MAX_DIFF_LENGTH : count;
        ___ishay++;
        vfs_read(file, old_content, length, pos);
        old_content[length] = 0;
        new_content[length] = 0;
        memcpy(new_content, buf, length);
        printk(KERN_ERR"[___ISHAY___]Write request for file named: %s count: %d pos: %lld:\n", 
                file->f_path.dentry->d_name.name,
                count,
                *pos);
        printk(KERN_ERR"[___ISHAY___]New content (replacement) <%d>:\n", length);

        for (i=0;i<length;i++)
        {
            printk("[0x%02x] (%c)", new_content[i], (new_content[i] > 32 && new_content[i] < 127) ? 
                    new_content[i] : 46);
            if (length+1 % 10 == 0)
                printk("\n");
        }
        printk(KERN_ERR"[___ISHAY___]Old content (on file now):\n");
        for (i=0;i<length;i++)
        {
            printk("[0x%02x] (%c)", old_content[i], (old_content[i] > 32 && old_content[i] < 127) ? 
                    old_content[i] : 46);
            if (length+1 % 10 == 0)
                printk("\n");
        }

    }
    if (ret >= 0) {
        count = ret;
        if (file->f_op->write)
            ret = file->f_op->write(file, buf, count, pos);
        else
            ret = do_sync_write(file, buf, count, pos);
        if (ret > 0) {
            fsnotify_modify(file);
            add_wchar(current, ret);
        }
        inc_syscw(current);
    }

    return ret;
}

Explanation:
vfs_write is the function that handles write requests for files, so that's our best central hook to catch modification requests for files before they occur.
vfs_write accepts the file, file position, buffer and length for the write operation, so we know what part of the file will be replaced by this write, and what data will replace it.

Since we know what part of the file will be altered, I added the vfs_read call just before the actual write to keep in memory the part of file we are about to overrun.

This should be a good starter point to get what you need, I made the following simplifications as this is only an example:

  • Buffers are allocated statically at max 128 bytes (should be allocated dynamically and protect the memory allocation from wasting too much memory on huge write requests)
  • File length should be checked and read buffer should refer to this check, the current code prints a read buffer even if the write overflows to length beyond the file end
  • The output currently goes to dmesg. A better implementation would be to keep a cyclic buffer accessible in debugfs, possibly with poll option
  • Current code captures write to ALL files, I'm sure that's not what you want...

[EDIT2]
Forgot to mention where this function is located, its under fs/read_write.c in the kernel tree

[EDIT3] There's another possible solution, providing you know which program you want to monitor, and that it doesn't have libc linked statically is use LD_PRELOAD to override the write function and use that as your hook and record the changes. I haven't tried this, but there's no reason why it shouldn't work

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM