[英]Find common parent-path in list of files and directories
I got a list of files and directories List<string> pathes
.我得到了一个文件和目录List<string> pathes
。 Now I'd like to calculate the deepest common branch every path is sharing with each other.现在我想计算每条路径彼此共享的最深公共分支。
We can assume that they all share a common path, but this is unknown in the beginning.我们可以假设它们都有共同的路径,但这在开始时是未知的。
Let's say I have the following three entries:假设我有以下三个条目:
This should get the result: C:/Hello/ as Earth is breaking this "chain" of subdirectories.这应该得到结果:C:/Hello/ 因为地球正在破坏子目录的这个“链”。
Second example:第二个例子:
-> C:/Hello/World/This/Is/ -> C:/Hello/World/This/Is/
How would you proceed?你将如何进行? I tried to use string.split(@"/") and start with the first string and check if every part of this array is contained in the other strings.我尝试使用 string.split(@"/") 并从第一个字符串开始并检查此数组的每个部分是否包含在其他字符串中。 However, this would be a very expensive call as I'm iterating (list_of_entries)^list_of_entries.但是,这将是一个非常昂贵的调用,因为我正在迭代 (list_of_entries)^list_of_entries。 Is there any better solution available?有没有更好的解决方案?
My current attempt would be something like the following (C# + LINQ):我目前的尝试类似于以下内容(C# + LINQ):
public string CalculateCommonPath(IEnumerable<string> paths)
{
int minSlash = int.MaxValue;
string minPath = null;
foreach (var path in paths)
{
int splits = path.Split('\\').Count();
if (minSlash > splits)
{
minSlash = splits;
minPath = path;
}
}
if (minPath != null)
{
string[] splits = minPath.Split('\\');
for (int i = 0; i < minSlash; i++)
{
if (paths.Any(x => !x.StartsWith(splits[i])))
{
return i >= 0 ? splits.Take(i).ToString() : "";
}
}
}
return minPath;
}
A function to get the longest common prefix may look like this:获取最长公共前缀的函数可能如下所示:
public static string GetLongestCommonPrefix(string[] s)
{
int k = s[0].Length;
for (int i = 1; i < s.Length; i++)
{
k = Math.Min(k, s[i].Length);
for (int j = 0; j < k; j++)
if (s[i][j] != s[0][j])
{
k = j;
break;
}
}
return s[0].Substring(0, k);
}
Then you may need to cut the prefix on the right hand.然后你可能需要剪掉右手边的前缀。 Eg we want to return c:/dir
instead of c:/dir/file
for例如,我们想返回c:/dir
而不是c:/dir/file
c:/dir/file1
c:/dir/file2
You also may want to normalize the paths before processing.您可能还想在处理之前对路径进行规范化。 See Normalize directory names in C# .请参阅在 C# 中规范化目录名称。
I dont know whether this is the best performing solution (probably not), but it surely is very easy to implement.我不知道这是否是性能最好的解决方案(可能不是),但它确实很容易实现。
Sample code:示例代码:
List<string> paths = new List<string>();
paths.Add(@"C:/Hello/World/This/Is/An/Example/Bla.cs");
paths.Add(@"C:/Hello/World/This/Is/Not/An/Example/");
paths.Add(@"C:/Hello/Earth/Bla/Bla/Bla");
List<string> sortedPaths = paths.OrderBy(s => s).ToList();
Console.WriteLine("Most common path here: {0}", sharedSubstring(sortedPaths[0], sortedPaths[sortedPaths.Count - 1]));
And that function of course:当然还有那个功能:
public static string sharedSubstring(string string1, string string2)
{
string ret = string.Empty;
int index = 1;
while (string1.Substring(0, index) == string2.Substring(0, index))
{
ret = string1.Substring(0, index);
index++;
}
return ret;
} // returns an empty string if no common characters where found
First sort the list with the paths to inspect.首先用要检查的路径对列表进行排序。 Then you can split and compare the first and the last item - if they are same proceed to the next dimension until you find a difference.然后您可以拆分并比较第一个和最后一个项目 - 如果它们相同,则继续下一个维度,直到找到差异为止。
So you just need to sort once and then inspect two items.所以你只需要排序一次,然后检查两个项目。
To return c:/dir
for返回c:/dir
for
c:/dir/file1
c:/dir/file2
I would code it this way:我会这样编码:
public static string GetLongestCommonPrefix(params string[] s)
{
return GetLongestCommonPrefix((ICollection<string>)s);
}
public static string GetLongestCommonPrefix(ICollection<string> paths)
{
if (paths == null || paths.Count == 0)
return null;
if (paths.Count == 1)
return paths.First();
var allSplittedPaths = paths.Select(p => p.Split('\\')).ToList();
var min = allSplittedPaths.Min(a => a.Length);
var i = 0;
for (i = 0; i < min; i++)
{
var reference = allSplittedPaths[0][i];
if (allSplittedPaths.Any(a => !string.Equals(a[i], reference, StringComparison.OrdinalIgnoreCase)))
{
break;
}
}
return string.Join("\\", allSplittedPaths[0].Take(i));
}
And here are some tests for it:这里有一些测试:
[TestMethod]
public void GetLongestCommonPrefixTest()
{
var str1 = @"C:\dir\dir1\file1";
var str2 = @"C:\dir\dir1\file2";
var str3 = @"C:\dir\dir1\file3";
var str4 = @"C:\dir\dir2\file3";
var str5 = @"C:\dir\dir1\file1\file3";
var str6 = @"C:\dir\dir1\file1\file3";
var res = Utilities.GetLongestCommonPrefix(str1, str2, str3);
Assert.AreEqual(@"C:\dir\dir1", res);
var res2 = Utilities.GetLongestCommonPrefix(str1, str2, str3, str4);
Assert.AreEqual(@"C:\dir", res2);
var res3 = Utilities.GetLongestCommonPrefix(str1, str2, str3, str5);
Assert.AreEqual(@"C:\dir\dir1", res3);
var res4 = Utilities.GetLongestCommonPrefix(str5, str6);
Assert.AreEqual(@"C:\dir\dir1\file1\file3", res4);
var res5 = Utilities.GetLongestCommonPrefix(str5);
Assert.AreEqual(str5, res5);
var res6 = Utilities.GetLongestCommonPrefix();
Assert.AreEqual(null, res6);
var res7 = Utilities.GetLongestCommonPrefix(null);
Assert.AreEqual(null, res7);
}
I would iterate over each character in the first path, comparing it with every character in every path (except the first) in the collection of paths:我会迭代第一个路径中的每个字符,将它与路径集合中每个路径(第一个除外)中的每个字符进行比较:
public string FindCommonPath(List<string> paths)
{
string firstPath = paths[0];
bool same = true;
int i = 0;
string commonPath = string.Empty;
while (same && i < firstPath.Length)
{
for (int p = 1; p < paths.Count && same; p++)
{
same = firstPath[i] == paths[p][i];
}
if (same)
{
commonPath += firstPath[i];
}
i++;
}
return commonPath;
}
You could iterate through the list first to find the shortest path and possibly improve it slightly.您可以首先遍历列表以找到最短路径,并可能稍微改进它。
The function that gives you the longest common directory path with best possible complexity:为您提供最长的公共目录路径并具有最佳复杂性的函数:
private static string GetCommonPath(IEnumerable<string> files)
{
// O(N, L) = N*L; N - number of strings, L - string length
// if the first and last path from alphabetic order matches, all paths in between match
string first = null;//smallest string
string last = null;//largest string
var comparer = StringComparer.InvariantCultureIgnoreCase;
// find smallest and largest string:
foreach (var file in files.Where(p => !string.IsNullOrWhiteSpace(p)))
{
if (last == null || comparer.Compare(file, last) > 0)
{
last = file;
}
if (first == null || comparer.Compare(file, first) < 0)
{
first = file;
}
}
if (first == null)
{
// the list is empty
return string.Empty;
}
if (first.Length > last.Length)
{
// first should not be longer
var tmp = first;
first = last;
last = tmp;
}
// get minimal length
var count = first.Length;
var found = string.Empty;
const char dirChar = '\\';
var sb = new StringBuilder(count);
for (var idx = 0; idx < count; idx++)
{
var current = first[idx];
var x = char.ToLowerInvariant(current);
var y = char.ToLowerInvariant(last[idx]);
if (x != y)
{
// first and last string character is different - break
return found;
}
sb.Append(current);
if (current == dirChar)
{
// end of dir character
found = sb.ToString();
}
}
if (last.Length >= count && last[count] == dirChar)
{
// whole first is common root:
return first;
}
return found;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.