简体   繁体   English

确定两个路径引用C#中同一文件的最佳方法

[英]Best way to determine if two path reference to same file in C#

In the upcoming Java7, there is a new API to check if two file object are same file reference. 在即将发布的Java7中,有一个新的API来检查两个文件对象是否是同一个文件引用。

Are there similar API provided in the .NET framework? .NET框架中是否提供了类似的API?

I've search it over MSDN but nothing enlighten me. 我通过MSDN搜索它,但没有任何启发我。

I want it simple but I don't want to compare by filename which will cause problems with hard/symbolic links and different style of path. 我希望它很简单,但我不想通过文件名进行比较,这将导致硬/符号链接和不同路径风格的问题。 (eg \\\\?\\C:\\ , C:\\ ). (例如\\\\?\\C:\\C:\\ )。

What I going to do is just prevent duplicated file being drag and dropped to my linklist. 我要做的是防止重复文件被拖放到我的链接列表中。

As far as I can see (1) (2) (3) (4) , the way JDK7 does it, is by calling GetFileInformationByHandle on the files and comparing dwVolumeSerialNumber, nFileIndexHigh and nFileIndexLow. 据我所知(1) (2) (3) (4) ,JDK7的工作方式是通过调用文件的GetFileInformationByHandle并比较dwVolumeSerialNumber,nFileIndexHigh和nFileIndexLow。

Per MSDN: 每个MSDN:

You can compare the VolumeSerialNumber and FileIndex members returned in the BY_HANDLE_FILE_INFORMATION structure to determine if two paths map to the same target; 您可以比较BY_HANDLE_FILE_INFORMATION结构中返回的VolumeSerialNumber和FileIndex成员,以确定两个路径是否映射到同一目标; for example, you can compare two file paths and determine if they map to the same directory. 例如,您可以比较两个文件路径并确定它们是否映射到同一目录。

I do not think this function is wrapped by .NET, so you will have to use P/Invoke . 我不认为这个函数是由.NET包装的,所以你必须使用P / Invoke

It might or might not work for network files. 它可能适用于网络文件,也可能不适用。 According to MSDN: 根据MSDN:

Depending on the underlying network components of the operating system and the type of server connected to, the GetFileInformationByHandle function may fail, return partial information, or full information for the given file. 根据操作系统的基础网络组件和连接到的服务器类型,GetFileInformationByHandle函数可能会失败,返回部分信息或给定文件的完整信息。

A quick test shows that it works as expected (same values) with a symbolic link on a Linux system connected using SMB/Samba, but that it cannot detect that a file is the same when accessed using different shares that point to the same file (FileIndex is the same, but VolumeSerialNumber differs). 快速测试表明它在使用SMB / Samba连接的Linux系统上使用符号链接按预期工作(相同值),但是当使用指向同一文件的不同共享访问时,它无法检测到文件是否相同( FileIndex是相同的,但VolumeSerialNumber不同)。

Edit : Note that @Rasmus Faber mentions the GetFileInformationByHandle function in the Win32 api, and this does what you want, check and upvote his answer for more information. 编辑 :请注意@Rasmus Faber在Win32 api中提到了GetFileInformationByHandle函数,这可以满足您的需求,查看并提供他的答案以获取更多信息。


I think you need an OS function to give you the information you want, otherwise it's going to have some false negatives whatever you do. 我认为你需要一个操作系统功能来提供你想要的信息,否则无论你做什么都会产生一些漏报。

For instance, does these refer to the same file? 例如,这些是指同一个文件吗?

  • \\server\\share\\path\\filename.txt \\服务器\\共享\\路径\\ FILENAME.TXT
  • \\server\\d$\\temp\\path\\filename.txt \\服务器\\ d $ \\ TEMP \\路径\\ FILENAME.TXT

I would examine how critical it is for you to not have duplicate files in your list, and then just do some best effort. 我会检查你的列表中没有重复文件是多么重要,然后尽一切努力。

Having said that, there is a method in the Path class that can do some of the work: Path.GetFullPath , it will at least expand the path to long names, according to the existing structure. 话虽如此,Path类中有一个方法可以完成一些工作: Path.GetFullPath ,它至少会根据现有结构扩展到长名称的路径。 Afterwards you just compare the strings. 之后你只需比较字符串。 It won't be foolproof though, and won't handle the two links above in my example. 虽然它不会万无一失,但在我的例子中不会处理上面的两个链接。

Here is a C# implementation of IsSameFile using GetFileInformationByHandle : 这是使用GetFileInformationByHandleIsSameFile的C#实现:

NativeMethods.cs NativeMethods.cs

public static class NativeMethods
{
  [StructLayout(LayoutKind.Explicit)]
  public struct BY_HANDLE_FILE_INFORMATION
  {
    [FieldOffset(0)]
    public uint FileAttributes;

    [FieldOffset(4)]
    public FILETIME CreationTime;

    [FieldOffset(12)]
    public FILETIME LastAccessTime;

    [FieldOffset(20)]
    public FILETIME LastWriteTime;

    [FieldOffset(28)]
    public uint VolumeSerialNumber;

    [FieldOffset(32)]
    public uint FileSizeHigh;

    [FieldOffset(36)]
    public uint FileSizeLow;

    [FieldOffset(40)]
    public uint NumberOfLinks;

    [FieldOffset(44)]
    public uint FileIndexHigh;

    [FieldOffset(48)]
    public uint FileIndexLow;
  }

  [DllImport("kernel32.dll", SetLastError = true)]
  public static extern bool GetFileInformationByHandle(SafeFileHandle hFile, out BY_HANDLE_FILE_INFORMATION lpFileInformation);

  [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
  public static extern SafeFileHandle CreateFile([MarshalAs(UnmanagedType.LPTStr)] string filename,
    [MarshalAs(UnmanagedType.U4)] FileAccess access,
    [MarshalAs(UnmanagedType.U4)] FileShare share,
    IntPtr securityAttributes,
    [MarshalAs(UnmanagedType.U4)] FileMode creationDisposition,
    [MarshalAs(UnmanagedType.U4)] FileAttributes flagsAndAttributes,
    IntPtr templateFile);
}

PathUtility.cs PathUtility.cs

public static bool IsSameFile(string path1, string path2)
{
  using (SafeFileHandle sfh1 = NativeMethods.CreateFile(path1, FileAccess.Read, FileShare.ReadWrite, 
      IntPtr.Zero, FileMode.Open, 0, IntPtr.Zero))
  {
    if (sfh1.IsInvalid)
      Marshal.ThrowExceptionForHR(Marshal.GetHRForLastWin32Error());

    using (SafeFileHandle sfh2 = NativeMethods.CreateFile(path2, FileAccess.Read, FileShare.ReadWrite,
      IntPtr.Zero, FileMode.Open, 0, IntPtr.Zero))
    {
      if (sfh2.IsInvalid)
        Marshal.ThrowExceptionForHR(Marshal.GetHRForLastWin32Error());

      NativeMethods.BY_HANDLE_FILE_INFORMATION fileInfo1;
      bool result1 = NativeMethods.GetFileInformationByHandle(sfh1, out fileInfo1);
      if (!result1)
        throw new IOException(string.Format("GetFileInformationByHandle has failed on {0}", path1));

      NativeMethods.BY_HANDLE_FILE_INFORMATION fileInfo2;
      bool result2 = NativeMethods.GetFileInformationByHandle(sfh2, out fileInfo2);
      if (!result2)
        throw new IOException(string.Format("GetFileInformationByHandle has failed on {0}", path2));

      return fileInfo1.VolumeSerialNumber == fileInfo2.VolumeSerialNumber
        && fileInfo1.FileIndexHigh == fileInfo2.FileIndexHigh
        && fileInfo1.FileIndexLow == fileInfo2.FileIndexLow;
    }
  }
}

Answer: There is no foolproof way in which you can compare to string base paths to determine if they point to the same file. 答:没有万无一失的方法可以与字符串基本路径进行比较,以确定它们是否指向同一个文件。

The main reason is that seemingly unrelated paths can point to the exact same file do to file system redirections (junctions, symbolic links, etc ...) . 主要原因是,看似不相关的路径可以指向与文件系统重定向完全相同的文件(联结,符号链接等)。 For example 例如

"d:\\temp\\foo.txt" "c:\\othertemp\\foo.txt" “d:\\ temp \\ foo.txt”“c:\\ othertemp \\ foo.txt”

These paths can potentially point to the same file. 这些路径可能指向同一文件。 This case clearly eliminates any string comparison function as a basis for determining if two paths point to the same file. 这种情况明确地消除了任何字符串比较功能,作为确定两个路径是否指向同一文件的基础。

The next level is comparing the OS file information. 下一级是比较OS文件信息。 Open the file for two paths and compare the handle information. 打开两个路径的文件并比较句柄信息。 In windows this can be done with GetFileInformationByHandle. 在Windows中,可以使用GetFileInformationByHandle完成。 Lucian Wischik did an excellent post on this subject here. Lucian Wischik在这里就这个主题做了很好的帖子

There is still a problem with this approach though. 但是这种方法仍然存在问题。 It only works if the user account performing the check is able to open both files for reading. 仅当执行检查的用户帐户能够打开两个文件进行阅读时,它才有效。 There are numerous items which can prevent a user from opening one or both files. 有许多项可以阻止用户打开一个或两个文件。 Including but not limited to ... 包括但不仅限于 ...

  • Lack of sufficient permissions to file 缺乏足够的文件权限
  • Lack of sufficient permissions to a directory in the path of the file 对文件路径中的目录缺乏足够的权限
  • File system change which occurs between the opening of the first file and the second such as a network disconnection. 在第一个文件打开和第二个文件之间发生的文件系统更改,例如网络断开。

When you start looking at all of these problems you begin to understand why Windows does not provide a method to determine if two paths are the same. 当您开始查看所有这些问题时,您就会开始理解为什么Windows不提供确定两条路径是否相同的方法。 It's just not an easy/possible question to answer. 这不是一个简单/可能的问题。

First I thought it is really easy but this doesn't work: 首先,我认为这是很容易的,但是这并不工作:

  string fileName1 = @"c:\vobp.log";
  string fileName2 = @"c:\vobp.log".ToUpper();
  FileInfo fileInfo1 = new FileInfo(fileName1);
  FileInfo fileInfo2 = new FileInfo(fileName2);

  if (!fileInfo1.Exists || !fileInfo2.Exists)
  {
    throw new Exception("one of the files does not exist");
  }

  if (fileInfo1.FullName == fileInfo2.FullName)
  {
    MessageBox.Show("equal"); 
  }

Maybe this library helps http://www.codeplex.com/FileDirectoryPath . 也许这个库有助于http://www.codeplex.com/FileDirectoryPath I haven't used it myself. 我自己没有用过它。

edit: See this example on that site: 编辑:在该网站上查看此示例:

  //
  // Path comparison
  //
  filePathAbsolute1 = new FilePathAbsolute(@"C:/Dir1\\File.txt");
  filePathAbsolute2 = new FilePathAbsolute(@"C:\DIR1\FILE.TXT");
  Debug.Assert(filePathAbsolute1.Equals(filePathAbsolute2));
  Debug.Assert(filePathAbsolute1 == filePathAbsolute2);

If you need to compare the same filenames over and over again, I would suggest you look into canonalizing those names. 如果你需要一遍又一遍地比较相同的文件名,我建议你研究一下这些名字。

Under a Unix system, there is the realpath() function which canonalizes your path. 在Unix系统下,有一个realpath()函数可以使你的路径成为可能。 I think that's generally the best bet if you have a complex path. 我认为如果你有一条复杂的道路,这通常是最好的选择。 However, it is likely to fail on volumes mounted via network connections. 但是,通过网络连接安装的卷可能会失败。

However, based on the realpath() approach, if you want to support multiple volume including network volumes, you could write your own function that checks each directory name in a path and if it references a volume then determine whether the volume reference in both paths is the same. 但是,基于realpath()方法,如果要支持包括网络卷在内的多个卷,可以编写自己的函数来检查路径中的每个目录名称,如果它引用了卷,则确定两个路径中的卷引用是一样的。 This being said, the mount point may be different (ie the path on the destination volume may not be the root of that volume) so it is not that easy to solve all the problems along the way, but it is definitively possible (otherwise how would it work in the first place?!) 这就是说,挂载点可能不同(即目标卷上的路径可能不是该卷的根),因此沿途解决所有问题并不容易,但它确实是可能的(否则如何它会起作用吗?!)

Once the filenames properly canonalized a simple string comparison gives you the correct answer. 一旦文件名正确地进行了元化,简单的字符串比较就会给出正确的答案。

Rasmus answer is probably the fastest way if you don't need to compare the same filenames over and over again. 如果您不需要一遍又一遍地比较相同的文件名,Rasmus答案可能是最快的方法。

You could always perform an MD5 encode on both and compare the result. 您始终可以对两者执行MD5编码并比较结果。 Not exactly efficient, but easier than manually comparing the files yourself. 效率不高,但比手动比较文件更容易。

Here is a post on how to MD5 a string in C# . 这是关于如何在C#中MD5字符串的帖子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM