简体   繁体   English

Java、Linux:如何检测两个java.ZF98ED07A4D5F50F7DEFZF4文件是否相同。

[英]Java, Linux: how to detect whether two java.io.Files refer to the same physical file

I'm looking for an efficient way to detect whether two java.io.File s refer to the same physical file.我正在寻找一种有效的方法来检测两个java.io.File是否指的是同一个物理文件。 According to the docs, File.equals() should do the job:根据文档, File.equals()应该完成这项工作:

Tests this abstract pathname for equality with the given object.测试这个抽象路径名是否与给定的 object 相等。 Returns true if and only if the argument is not null and is an abstract pathname that denotes the same file or directory as this abstract pathname.当且仅当参数不是 null 并且是表示与此抽象路径名相同的文件或目录的抽象路径名时才返回 true。

However, given a FAT32 partition (actually a TrueCrypt container) which is mounted at /media/truecrypt1:然而,给定一个挂载在 /media/truecrypt1 的 FAT32 分区(实际上是一个 TrueCrypt 容器):

new File("/media/truecrypt1/File").equals(new File("/media/truecrypt1/file")) == false

Would you say that this conforms to the specification?你会说这符合规范吗? And in this case, how to work around that problem?在这种情况下,如何解决这个问题?

Update: Thanks to commenters, for Java 7 I've found java.io.Files.isSameFile() which works for me.更新:感谢评论者,对于 Java 7 我发现java.io.Files.isSameFile()对我有用。

The answer in @Joachim's comment is normally correct. @Joachim 评论中的答案通常是正确的。 The way to determine if two File object refer to the same OS file is to use getCanonicalFile() or getCanonicalPath().判断两个File object 是否引用同一个 OS 文件的方法是使用 getCanonicalFile() 或 getCanonicalPath()。 The javadoc says this: javadoc是这样说的:

"A canonical pathname is both absolute and unique. [...] Every pathname that denotes an existing file or directory has a unique canonical form." “规范路径名是绝对且唯一的。[...] 表示现有文件或目录的每个路径名都有唯一的规范形式。”

So the following should work:所以以下应该工作:

File f1 = new File("/media/truecrypt1/File");  // different capitalization ...
File f2 = new File("/media/truecrypt1/file");  // ... but same OS file (on Windows)
if (f1.getCanonicalPath().equals(f2.getCanonicalPath())) {
    System.out.println("Files are equal ... no kittens need to die.");
}

However, it would appear that you are viewing a FAT32 file system mounted on UNIX / Linux.但是,您似乎正在查看安装在 UNIX / Linux 上的 FAT32 文件系统。 AFAIK, Java does not know that this is happening, and is just applying the generic UNIX / Linux rules for file names... which give the wrong answer in this scenario. AFAIK,Java 不知道这种情况正在发生,并且只是应用通用 UNIX / Linux 文件名在此场景规则中给出了错误的答案...

If this is what is really happening, I don't think there is a reliable solution in pure Java 6. However,如果这是真的发生的事情,我认为纯 Java 6 中没有可靠的解决方案。但是,

  • You could do some hairy JNI stuff;你可以做一些毛茸茸的 JNI 东西; eg get the file descriptor numbers and then in native code, use the fstat(2) system call to get hold of the two files' device and inode numbers and comparing those.例如,获取文件描述符编号,然后在本机代码中,使用fstat(2)系统调用来获取两个文件的设备和 inode 编号并进行比较。

  • Java 7 java.nio.file.Path.equals(Object) looks like it might give the right answer if you call resolve() on the paths first to resolve symlinks. Java 7 java.nio.file.Path.equals(Object)看起来如果您首先在路径上调用resolve()来解析符号链接,它可能会给出正确的答案。 (It is a little unclear from the javadoc whether each mounted filesystem on Linux will correspond to a distinct FileSystem object.) (从 javadoc 中不清楚 Linux 上的每个已安装文件系统是否将对应于不同的文件系统FileSystem 。)

  • The Java 7 tutorials have this section on seeing if two Path objects are for the same file... which recommends using java.nio.file.Files.isSameFile(Path, Path) Java 7 教程有本节介绍查看两个Path对象是否用于同一个文件...建议使用java.nio.file.Files.isSameFile(Path, Path)


Would you say that this conforms to the specification?你会说这符合规范吗?

No and yes.不,是的。

  • No in the sense that the getCanonicalPath() method is not returning the same value for each existing OS file... which is what you'd expect from reading the javadoc.没有,因为getCanonicalPath()方法没有为每个现有的 OS 文件返回相同的值……这是您阅读 javadoc 所期望的。

  • Yes in the technical sense that the Java codebase (not the javadoc) is the ultimate specification... both in theory and in practice.是的,从技术意义上讲,Java 代码库(不是 javadoc)是最终规范......无论是在理论上还是在实践中。

you could try to obtain an exclusive write lock on the file , and see if that fails:您可以尝试获取文件的独占写锁,看看是否失败:

boolean isSame;
try {
   FileOutputStream file1 = new FileOutputStream (file1);
   FileOutputStream file2 = new FileOutputStream (file2);
   FileChannel channel1 = file1.getChannel();
   FileChannel channel2 = file2.getChannel();
   FileLock fileLock1 = channel1.tryLock();
   FileLock fileLock2 = channel2.tryLock();
   isSame = fileLock2 != null;
} catch(/*appropriate exceptions*/) {
   isSame = false;
} finally {
   fileLock1.unlock();
   fileLock2.unlock();
   file1.close();
   file2.close();
   ///cleanup etc...
}
System.out.println(file1 + " and " + file2 + " are " + (isSame?"":"not") + " the same");

This is not always guaranteed to be correct tho - because another process could potentially have obtained the lock, and thus fail for you.这并不总是保证是正确的 - 因为另一个进程可能已经获得了锁,因此对你来说失败了。 But at least this doesn't require you to shell out to an external process.但这至少不需要您将 shell 输出到外部进程。

There's a chance the same file has two paths (eg over the network \\localhost\file and \\127.0.0.1\file would refer to the same file with a different path).同一个文件有可能有两个路径(例如,通过网络\\localhost\file\\127.0.0.1\file将引用具有不同路径的同一个文件)。 I would go with comparing digests of both files to determine whether they are identical or not.我会 go 比较两个文件的摘要以确定它们是否相同。 Something like this像这样的东西

public static void main(String args[]) {
    try {
        File f1 = new File("\\\\79.129.94.116\\share\\bots\\triplon_bots.jar");
        File f2 = new File("\\\\triplon\\share\\bots\\triplon_bots.jar");
        System.out.println(f1.getCanonicalPath().equals(f2.getCanonicalPath()));
        System.out.println(computeDigestOfFile(f1).equals(computeDigestOfFile(f2)));
    }
    catch(Exception e) {
        e.printStackTrace();
    }
}

private static String computeDigestOfFile(File f) throws Exception {
    MessageDigest md = MessageDigest.getInstance("MD5");
    InputStream is = new FileInputStream(f);
    try {
        is = new DigestInputStream(is, md);
        byte[] buffer = new byte[1024];
        while(is.read(buffer) != -1) {
            md.update(buffer);
        }
    }
    finally {
        is.close();
    }
    return new BigInteger(1,md.digest()).toString(16);
}

It outputs它输出

false
true

This approach is of course much slower than any sort of path comparison, it also depends on the size of files.这种方法当然比任何类型的路径比较都要慢得多,它还取决于文件的大小。 Another possible side effect is that two files will be considered equals equal indifferently from their locations.另一个可能的副作用是两个文件将被视为相等,与它们的位置无关。

The Files.isSameFile method was added for exactly this kind of usage - that is, you want to check if two non-equal paths locate the same file. Files.isSameFile 方法正是为这种用法而添加的——也就是说,您要检查两个不相等的路径是否定位到同一个文件。

On *nix systems, casing does have an importance.在 *nix 系统上,大小写确实很重要。 file is not the same as File or fiLe . fileFilefiLe

The API doc of equals() says (right after your quote): equals()的 API 文档说(在您的报价之后):

On UNIX systems, alphabetic case is significant in comparing pathnames;在 UNIX 系统上,字母大小写在比较路径名时很重要; on Microsoft Windows systems it is not.在 Microsoft Windows 系统上它不是。

You can try Runtime.exec() of你可以试试 Runtime.exec() 的

ls -i /fullpath/File # extract the inode number.
df /fullpath/File # extract the "Mounted on" field.

If the mount point and the "inode" number is the same, they are the same file whether you have symbolic links or case-insensitive file systems.如果挂载点和“inode”编号相同,则无论您有符号链接还是不区分大小写的文件系统,它们都是同一个文件。

Or even甚至

bash test "file1" -ef "file2"

FILE1 and FILE2 have the same device and inode numbers FILE1 和 FILE2 具有相同的设备和 inode 号

The traditional Unix way to test whether two filenames refer to the same underlying filesystem object is to stat them and test whether they have the same [dev,ino] pair.测试两个文件名是否引用相同的底层文件系统 object 的传统 Unix 方法是stat它们并测试它们是否具有相同的[dev,ino]对。

That does assume no redundant mounts, however.但是,这确实假设没有多余的安装。 If those are allowed, you have to go about it differently.如果这些是允许的,你必须以不同的方式对它进行 go 。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM