简体   繁体   English

在文件和目录列表中查找公共父路径

[英]Find common parent-path in list of files and directories

I got a list of files and directories List<string> pathes .我得到了一个文件和目录List<string> pathes Now I'd like to calculate the deepest common branch every path is sharing with each other.现在我想计算每条路径彼此共享的最深公共分支。

We can assume that they all share a common path, but this is unknown in the beginning.我们可以假设它们都有共同的路径,但这在开始时是未知的。

Let's say I have the following three entries:假设我有以下三个条目:

  • C:/Hello/World/This/Is/An/Example/Bla.cs C:/Hello/World/This/Is/An/Example/Bla.cs
  • C:/Hello/World/This/Is/Not/An/Example/ C:/Hello/World/This/Is/Not/An/Example/
  • C:/Hello/Earth/Bla/Bla/Bla C:/你好/地球/Bla/Bla/Bla

This should get the result: C:/Hello/ as Earth is breaking this "chain" of subdirectories.这应该得到结果:C:/Hello/ 因为地球正在破坏子目录的这个“链”。

Second example:第二个例子:

  • C:/Hello/World/This/Is/An/Example/Bla.cs C:/Hello/World/This/Is/An/Example/Bla.cs
  • C:/Hello/World/This/Is/Not/An/Example/ C:/Hello/World/This/Is/Not/An/Example/

-> C:/Hello/World/This/Is/ -> C:/Hello/World/This/Is/

How would you proceed?你将如何进行? I tried to use string.split(@"/") and start with the first string and check if every part of this array is contained in the other strings.我尝试使用 string.split(@"/") 并从第一个字符串开始并检查此数组的每个部分是否包含在其他字符串中。 However, this would be a very expensive call as I'm iterating (list_of_entries)^list_of_entries.但是,这将是一个非常昂贵的调用,因为我正在迭代 (list_of_entries)^list_of_entries。 Is there any better solution available?有没有更好的解决方案?

My current attempt would be something like the following (C# + LINQ):我目前的尝试类似于以下内容(C# + LINQ):

    public string CalculateCommonPath(IEnumerable<string> paths)
    {
        int minSlash = int.MaxValue;
        string minPath = null;
        foreach (var path in paths)
        {
            int splits = path.Split('\\').Count();
            if (minSlash > splits)
            {
                minSlash = splits;
                minPath = path;
            }
        }

        if (minPath != null)
        {
            string[] splits = minPath.Split('\\');
            for (int i = 0; i < minSlash; i++)
            {
                if (paths.Any(x => !x.StartsWith(splits[i])))
                {
                    return i >= 0 ? splits.Take(i).ToString() : "";
                }
            }
        }
        return minPath;
    }

A function to get the longest common prefix may look like this:获取最长公共前缀的函数可能如下所示:

public static string GetLongestCommonPrefix(string[] s)
{
    int k = s[0].Length;
    for (int i = 1; i < s.Length; i++)
    {
        k = Math.Min(k, s[i].Length);
        for (int j = 0; j < k; j++)
            if (s[i][j] != s[0][j])
            {
                k = j;
                break;
            }
    }
    return s[0].Substring(0, k);
}

Then you may need to cut the prefix on the right hand.然后你可能需要剪掉右手边的前缀。 Eg we want to return c:/dir instead of c:/dir/file for例如,我们想返回c:/dir而不是c:/dir/file

c:/dir/file1
c:/dir/file2

You also may want to normalize the paths before processing.您可能还想在处理之前对路径进行规范化。 See Normalize directory names in C# .请参阅在 C# 中规范化目录名称

I dont know whether this is the best performing solution (probably not), but it surely is very easy to implement.我不知道这是否是性能最好的解决方案(可能不是),但它确实很容易实现。

  • Sort your list alphabetically按字母顺序排列您的列表
  • compare the first entry in that sorted list to the last in that list, character by character, and terminate when you find a difference (the value before the termination is the longest shared substring of both those strings)逐个字符地将该排序列表中的第一个条目与该列表中的最后一个条目进行比较,并在发现差异时终止(终止前的值是这两个字符串的最长共享子字符串)

Sample Fiddle样品小提琴

Sample code:示例代码:

List<string> paths = new List<string>();

paths.Add(@"C:/Hello/World/This/Is/An/Example/Bla.cs");
paths.Add(@"C:/Hello/World/This/Is/Not/An/Example/");
paths.Add(@"C:/Hello/Earth/Bla/Bla/Bla");

List<string> sortedPaths = paths.OrderBy(s => s).ToList();

Console.WriteLine("Most common path here: {0}", sharedSubstring(sortedPaths[0], sortedPaths[sortedPaths.Count - 1]));

And that function of course:当然还有那个功能:

public static string sharedSubstring(string string1, string string2)
{
    string ret = string.Empty;

    int index = 1;
    while (string1.Substring(0, index) == string2.Substring(0, index))
    {
        ret = string1.Substring(0, index);
        index++;
    }

    return ret;
} // returns an empty string if no common characters where found

First sort the list with the paths to inspect.首先用要检查的路径对列表进行排序。 Then you can split and compare the first and the last item - if they are same proceed to the next dimension until you find a difference.然后您可以拆分并比较第一个和最后一个项目 - 如果它们相同,则继续下一个维度,直到找到差异为止。

So you just need to sort once and then inspect two items.所以你只需要排序一次,然后检查两个项目。

To return c:/dir for返回c:/dir for

c:/dir/file1
c:/dir/file2

I would code it this way:我会这样编码:

public static string GetLongestCommonPrefix(params string[] s)
{
    return GetLongestCommonPrefix((ICollection<string>)s);
}

public static string GetLongestCommonPrefix(ICollection<string> paths)
{
    if (paths == null || paths.Count == 0)
        return null;


    if (paths.Count == 1)
        return paths.First();

    var allSplittedPaths = paths.Select(p => p.Split('\\')).ToList();

    var min = allSplittedPaths.Min(a => a.Length);
    var i = 0;
    for (i = 0; i < min; i++)
    {
        var reference = allSplittedPaths[0][i];
        if (allSplittedPaths.Any(a => !string.Equals(a[i], reference, StringComparison.OrdinalIgnoreCase)))
        {
            break;
        }
    }

    return string.Join("\\", allSplittedPaths[0].Take(i));
}

And here are some tests for it:这里有一些测试:

[TestMethod]
public void GetLongestCommonPrefixTest()
{
    var str1 = @"C:\dir\dir1\file1";
    var str2 = @"C:\dir\dir1\file2";
    var str3 = @"C:\dir\dir1\file3";
    var str4 = @"C:\dir\dir2\file3";
    var str5 = @"C:\dir\dir1\file1\file3";
    var str6 = @"C:\dir\dir1\file1\file3";


    var res = Utilities.GetLongestCommonPrefix(str1, str2, str3);

    Assert.AreEqual(@"C:\dir\dir1", res);

    var res2 = Utilities.GetLongestCommonPrefix(str1, str2, str3, str4);

    Assert.AreEqual(@"C:\dir", res2);

    var res3 = Utilities.GetLongestCommonPrefix(str1, str2, str3, str5);

    Assert.AreEqual(@"C:\dir\dir1", res3);

    var res4 = Utilities.GetLongestCommonPrefix(str5, str6);

    Assert.AreEqual(@"C:\dir\dir1\file1\file3", res4);

    var res5 = Utilities.GetLongestCommonPrefix(str5);

    Assert.AreEqual(str5, res5);

    var res6 = Utilities.GetLongestCommonPrefix();

    Assert.AreEqual(null, res6);

    var res7 = Utilities.GetLongestCommonPrefix(null);

    Assert.AreEqual(null, res7);
}

I would iterate over each character in the first path, comparing it with every character in every path (except the first) in the collection of paths:我会迭代第一个路径中的每个字符,将它与路径集合中每个路径(第一个除外)中的每个字符进行比较:

public string FindCommonPath(List<string> paths)
{
    string firstPath = paths[0];
    bool same = true;

    int i = 0;

    string commonPath = string.Empty;

    while (same && i < firstPath.Length)
    {
        for (int p = 1; p < paths.Count && same; p++)
        {
            same = firstPath[i] == paths[p][i];
        }

        if (same)
        {
            commonPath += firstPath[i];
        }
        i++;
    }

    return commonPath;
}

You could iterate through the list first to find the shortest path and possibly improve it slightly.您可以首先遍历列表以找到最短路径,并可能稍微改进它。

The function that gives you the longest common directory path with best possible complexity:为您提供最长的公共目录路径并具有最佳复杂性的函数:

private static string GetCommonPath(IEnumerable<string> files)
{
    // O(N, L) = N*L; N  - number of strings, L - string length

    // if the first and last path from alphabetic order matches, all paths in between match
    string first = null;//smallest string
    string last = null;//largest string

    var comparer = StringComparer.InvariantCultureIgnoreCase;
    // find smallest and largest string:
    foreach (var file in files.Where(p => !string.IsNullOrWhiteSpace(p)))
    {
        if (last == null || comparer.Compare(file, last) > 0)
        {
            last = file;
        }

        if (first == null || comparer.Compare(file, first) < 0)
        {
            first = file;
        }
    }

    if (first == null)
    {
        // the list is empty
        return string.Empty;
    }

    if (first.Length > last.Length)
    {
        // first should not be longer
        var tmp = first;
        first = last;
        last = tmp;
    }

    // get minimal length
    var count = first.Length;
    var found = string.Empty;

    const char dirChar = '\\';
    var sb = new StringBuilder(count);
    for (var idx = 0; idx < count; idx++)
    {
        var current = first[idx];
        var x = char.ToLowerInvariant(current);
        var y = char.ToLowerInvariant(last[idx]);

        if (x != y)
        {
            // first and last string character is different - break
            return found;
        }

        sb.Append(current);

        if (current == dirChar)
        {
            // end of dir character
            found = sb.ToString();
        }
    }

    if (last.Length >= count && last[count] == dirChar)
    {
        // whole first is common root:
        return first;
    }

    return found;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM