简体   繁体   English

确定一个字符串是否是另一个字符串的前缀

[英]Determine if one string is a prefix of another

I have written down a simple function that determines if str1 is a prefix of str2. 我写下了一个简单的函数,它确定str1是否是str2的前缀。 It's a very simple function, that looks like this (in JS): 这是一个非常简单的函数,看起来像这样(在JS中):

function isPrefix(str1, str2) // determine if str1 is a prefix of a candidate string
{
    if(str2.length < str1.length) // candidate string can't be smaller than prefix string 
        return false;

    var i = 0;
    while(str1.charAt(i) == str2.charAt(i) && i <= str1.length)
        i++;
   if(i < str1.length) // i terminated => str 1 is smaller than str 2
        return false;
    return true;
}

As you can see, it loops through the entire length of the prefix string to gauge if it is a prefix of the candidate string. 如您所见,它循环遍历前缀字符串的整个长度,以衡量它是否是候选字符串的前缀。 This means it's complexity is O(N), which isn't bad but this becomes a problem when I have a huge data set to consider looping through to determine which strings have the prefix string as a part of the prefix. 这意味着它的复杂性是O(N),这也不错,但是当我有一个庞大的数据集来考虑循环以确定哪些字符串具有前缀字符串作为前缀的一部分时,这就成了一个问题。 This makes the complexity multiple like O(M*N) where M is the total number of strings in a given data set. 这使得复杂性像O(M * N)一样多,其中M是给定数据集中的字符串总数。 Not good. 不好。

I explored the Internet a bit to determine that the best answer would be a Patricia/Radix trie. 我稍微探讨了互联网,以确定最佳答案是Patricia / Radix trie。 Where strings are stored as prefixes. 字符串存储为前缀的位置。 Even then, when I attempt to insert/look-up a string, there will be a considerable overhead in string matching if I use the aforementioned prefix gauging function. 即便如此,当我尝试插入/查找字符串时,如果我使用上述前缀测量功能,则字符串匹配会有相当大的开销。

Say I had a prefix string 'rom' and a set of candidate words 假设我有一个前缀字符串'rom'和一组候选词

var dataset =["random","rapid","romance","romania","rome","rose"]; var dataset = [“random”,“rapid”,“romance”,“romania”,“rome”,“rose”];

that would like this in a radix trie : 在基数trie中想要这样:

         r
       /    \
     a       o
    / \     / \
ndom pid  se  m
             / \
           an   e
          /  \
        ia   ce

This means, for every node, I will be using the prefix match function, to determine which node has a value that matches the prefix string at the index. 这意味着,对于每个节点,我将使用前缀匹配函数来确定哪个节点具有与索引处的前缀字符串匹配的值。 Somehow, this solution still seems arduous and does not sit too well with me. 不知何故,这个解决方案看起来仍然很艰巨,并不适合我。 Is there something better or anyway I can improve the core prefix matching function ? 有没有更好的东西或者无论如何我可以改进核心前缀匹配功能?

Looks like you've got two different problems. 看起来你有两个不同的问题。

One is to determine if a string is contained as a prefix in another string. 一种方法是确定字符串是否包含在另一个字符串中作为前缀。 For this I would suggest using a function already implemented in the language's string library. 为此,我建议使用已在语言的字符串库中实现的函数。 In JavaScript you could do this 在JavaScript中你可以做到这一点

if (str2.indexOf(str1) === 0) {
    // string str1 is a prefix of str2
}

See documentation for String.indexOf here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf 请参阅此处的String.indexOf文档: https//developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf

For the other problem, in a bunch of strings, find out which ones have a given string as a prefix, building a data structure like a Trie or the one you mention seems like the way to go, if you want fast look-ups. 对于另一个问题,在一堆字符串中,找出哪些字符串作为前缀,如果你想要快速查找,建立一个像Trie这样的数据结构或你提到的那个似乎是要走的路。

Check out this thread on stackoverflow - How to check if a string "StartsWith" another string? 在stackoverflow上查看这个线程 - 如何检查字符串“StartsWith”是否是另一个字符串? . Mark Byers solution seems to be very efficient. Mark Byers解决方案似乎非常有效。 Also for Java there are built in String functions "endsWith" and "startsWith" - http://docs.oracle.com/javase/tutorial/java/data/comparestrings.html 同样对于Java,内置字符串函数“endsWith”和“startsWith” - http://docs.oracle.com/javase/tutorial/java/data/comparestrings.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM