[英]Implementing an efficent algorithm to find the intersection of two strings
Implement an algorithm that takes two strings as input, and returns the intersection of the two, with each letter represented at most once. 实现一个算法,该算法将两个字符串作为输入,并返回两者的交集,每个字母最多表示一次。
Algo: (considering language used will be c#) Algo :(考虑使用的语言将是c#)
This is an O(n), solution but is uses extra space, 2 char arrays and a hash table 这是一个O(n)解决方案但是使用了额外的空间,2个char数组和一个哈希表
Can you guys think of better solution than this? 你能想到比这更好的解决方案吗?
How about this ... 这个怎么样 ...
var s1 = "aabbccccddd";
var s2 = "aabc";
var ans = s1.Intersect(s2);
Haven't tested this, but here's my thought: 没有测试过这个,但这是我的想法:
Won't use additional memory, only needs the two original strings, two integers, and an output string (or StringBuilder). 不会使用额外的内存,只需要两个原始字符串,两个整数和一个输出字符串(或StringBuilder)。 As an added bonus, the output values will be sorted too! 作为额外的奖励,输出值也将被排序!
Part 2: This is what I'd write (sorry about the comments, new to stackoverflow): 第2部分:这是我写的(对于注释,对stackoverflow的新内容感到抱歉):
private static string intersect(string left, string right)
{
StringBuilder theResult = new StringBuilder();
string sortedLeft = Program.sort(left);
string sortedRight = Program.sort(right);
int leftIndex = 0;
int rightIndex = 0;
// Work though the string with the "first last character".
if (sortedLeft[sortedLeft.Length - 1] > sortedRight[sortedRight.Length - 1])
{
string temp = sortedLeft;
sortedLeft = sortedRight;
sortedRight = temp;
}
char lastChar = default(char);
while (leftIndex < sortedLeft.Length)
{
char nextChar = (sortedLeft[leftIndex] <= sortedRight[rightIndex]) ? sortedLeft[leftIndex++] : sortedRight[rightIndex++];
if (lastChar == nextChar) continue;
theResult.Append(nextChar);
lastChar = nextChar;
}
// Add the remaining characters from the "right" string
while (rightIndex < sortedRight.Length)
{
char nextChar = sortedRight[rightIndex++];
if (lastChar == nextChar) continue;
theResult.Append(nextChar);
lastChar = nextChar;
}
theResult.Append(sortedRight, rightIndex, sortedRight.Length - rightIndex);
return (theResult.ToString());
}
I hope that makes more sense. 我希望这更有意义。
You don't need to 2 char arrays. 您不需要2个char数组。 The System.String data type has a built-in indexer by position that returns the char from that position, so you could just loop through from 0 to (String.Length - 1). System.String数据类型有一个按位置的内置索引器,它从该位置返回char,因此您可以从0循环到(String.Length - 1)。 If you're more interested in speed than optimizing storage space, then you could make a HashSet for the one of the strings, then make a second HashSet which will contain your final result. 如果您对速度比对优化存储空间更感兴趣,那么您可以为其中一个字符串创建一个HashSet,然后创建一个包含最终结果的第二个HashSet。 Then you iterate through the second string, testing each char against the first HashSet, and if it exists then add it the second HashSet. 然后迭代遍历第二个字符串,针对第一个HashSet测试每个char,如果它存在则将其添加到第二个HashSet。 By the end, you already have a single HashSet with all the intersections, and save yourself the pass of running through the Hashtable looking for ones with a non-zero value. 最后,您已经拥有一个包含所有交叉点的HashSet,并且自己保存在Hashtable中运行的通道,以查找具有非零值的HashSet。
EDIT: I entered this before all the comments on the question about not wanting to use any built-in containers at all 编辑:我在关于不想使用任何内置容器的问题的所有评论之前输入了这个
here's how I would do this. 这是我怎么做的。 It's still O(N) and it doesn't use a hash table but instead one int array of length 26. (ideally) 它仍然是O(N)并且它不使用哈希表,而是使用长度为26的一个int数组。(理想情况下)
still O(N) and extra space of only 26 ints. 还有O(N)和额外的空间只有26个整数。
of course if you're not limited to only lower or uppercase characters your array size may need to change. 当然,如果您不仅限于低位或大写字符,则可能需要更改数组大小。
"with each letter represented at most once" “每个字母最多代表一次”
I'm assuming that this means you just need to know the intersections, and not how many times they occurred. 我假设这意味着你只需要知道交叉点,而不是它们发生了多少次。 If that's so then you can trim down your algorithm by making use of yield . 如果是这样,那么你可以通过利用yield减少你的算法。 Instead of storing the count and continuing to iterate the second string looking for additional matches, you can yield the intersection right there and continue to the next possible match from the first string. 而不是存储计数并继续迭代第二个字符串以寻找其他匹配,您可以在那里产生交集,并继续从第一个字符串开始下一个可能的匹配。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.