将两个字符串与动态空白区域进行比较的最快方法？

Question

I have two strings bigstring and smallstring , and each string is a paragraph of words. 我有两个字符串bigstring和smallstring ，每个字符串是一段单词。 However in between each word is a bunch of whitespace ( \\s in regex) characters of random length. 但是在每个单词之间是一堆随机长度的空格（正则表达式中的\\s ）字符。

So for example bigstring could be like hello world . 所以例如bigstring可能就像hello world 。 And this goes for smallstring too. 这也适用于小smallstring 。

What I want to be able to do is, check if smallstring is a substring of bigstring (word for word) where the \\s+ part of it is considered the same, and case insensitively . 我希望能够做的是，检查是否smallstring是的一个子bigstring其中（逐字逐句） \\s+的一部分被认为是相同的，并且不区分大小写的情况下 。 So for example if 例如，如果

bigstring = "hello \\t\\r\\n world \\n foobar"

smallstring = "HELLO \\t world"

then smallstring is a substring of bigstring . 然后smallstring是的一个子bigstring 。

bigstring = "hello \\t\\r\\n world \\n foobar"

smallstring = "HEL"

This is not a substring (word for word), because there is no word called hel in bigstring . 这不是子字符串（逐字逐句），因为bigstring没有名为hel的bigstring 。

bigstring = "the \\t\\r\\n nest"

smallstring = "then \\n est"

This is also not a substring (word for word). 这也不是子字符串（逐字逐句）。

One method is to tokenize both strings into arrays, so break up the stuff between \\s+ into tokens, and the \\s+ is the delimiters. 一种方法是将两个字符串标记为数组，因此将\\s+之间的内容分解为标记，而\\s+是分隔符。 Then literally check if one array is contained in the other array in order and consecutively with case insensitively. 然后逐字地检查一个数组是否按顺序包含在另一个数组中，并且连续不区分大小写。

However in this case, I need speed to be the priority, as it should be the fastest way. 但是在这种情况下，我需要速度作为优先级，因为它应该是最快的方式。

Does anyone know a way to check this? 有谁知道检查这个的方法？

I was perhaps thinking of a way to check these strings as you loop through both, character by character, but not sure how to do that? 我或许想到一种方法来检查这些字符串，因为你逐个字符地循环，但不知道如何做到这一点？

Thanks 谢谢

Answer 1

I am not sure where this ranks on speed, but does this achieve your goal (now edited for edge case of 'impl' vs. 'mpl', by adding leading space) 我不确定这在速度上排名，但这是否达到了你的目标（现在通过添加领先空间编辑 'impl'与'mpl'的边缘情况）

var isSubstring = function(bigstring, smallstring) {
  bigstring = " " + bigstring.replace(/\s+/g, " ").toLowerCase() + " "
  smallstring = " " + smallstring.replace(/\s+/g, " ").toLowerCase() + " "
  return(bigstring.indexOf(smallstring) >= 0)
}

Adding a trailing (and, now, leading) space covers the case where smallstring is a single word fragment ('hel' vs. 'hello' and 'impl' vs. 'mpl' in your example above and in comments below) 添加尾随（现在，前导）空间涵盖了smallstring是单个单词片段的情况（在上面的示例和下面的注释中，'hel'与'hello'和'impl'对比'mpl'）

Use cases: 用例：

bigstring = "hello   \t\r\n  world \n foobar"
smallstring = "HELLO \t world"
console.log(isSubstring(bigstring, smallstring))
//evaluates to true

bigstring = "hello   \t\r\n  world \n foobar"
smallstring = "HEL"
console.log(isSubstring(bigstring, smallstring))
// evaluates to false

bigstring = "impl"
smallstring = "mpl"
console.log(isSubstring(bigstring, smallstring))
// evaluates to false

Answer 2

RegExp is definitely not the fastest, but you can search the big string with a RegExp generated from the small string: RegExp绝对不是最快的，但您可以使用从小字符串生成的RegExp搜索大字符串：

 bigstring = "hello \\t\\r\\n world \\n foobar" smallstring = "HELLO \\t world" r = new RegExp( '\\\\b' + smallstring.replace(/\\s+/g, '\\\\s+') + '\\\\b', 'i' ) console.log( r.test(bigstring), r ) // true /\\bHELLO\\s+world\\b/i

A faster case-insensitive string search would most likely use charCodeAt and/or some kind of a word/token lookup structure, as for example https://github.com/bvaughn/js-search seems to use. 更快的不区分大小写的字符串搜索很可能使用charCodeAt和/或某种单词/标记查找结构，例如https://github.com/bvaughn/js-search似乎使用。

Answer 3

Let F(a) will return unified version of string a . 让F(a)返回字符串a统一版本。 By unified I mean that all consecutive space characters will be replaced by a single space and all letters will be moved to lower case. 通过统一我的意思是所有连续的空格字符将被一个空格替换，所有字母将被移动到小写字母。 This function can be calculated in linear time - O(|a|) . 该函数可以在线性时间内计算 - O(|a|) 。

In this case you need to check if F(smallstring) is substring of F(bigstring) . 在这种情况下，你需要检查，如果F(smallstring)是子F(bigstring) To handle this quickly you can use some standard algo like KMP . 为了快速处理这个问题，你可以使用像KMP这样的标准算法。

将两个字符串与动态空白区域进行比较的最快方法？

问题描述

3 个解决方案

解决方案1
1 2017-11-30 00:28:43

解决方案2
1 2017-11-30 02:34:24

解决方案3
0 2017-11-30 00:30:56

将两个字符串与动态空白区域进行比较的最快方法？

问题描述

3 个解决方案

解决方案1 1 2017-11-30 00:28:43

解决方案2 1 2017-11-30 02:34:24

解决方案3 0 2017-11-30 00:30:56

解决方案1
1 2017-11-30 00:28:43

解决方案2
1 2017-11-30 02:34:24

解决方案3
0 2017-11-30 00:30:56