[英]Find optimal set of substrings in given string
I'm trying to find optimal set of strings for given string. 我试图找到给定字符串的最佳字符串集。
Given string: "FEEJEEDAI"
给定的字符串:“ FEEJEEDAI”
Substrings values: 子字符串值:
FE - 1
FE-1
JE - 2JE-2
JEE - 3杰-3
AI - 4AI-4
DAI - 6DAI-6
Possible combinations: 可能的组合:
1) [FE-JE-DAI] - 1+2+6 = 9
1)[FE-JE-DAI]-1 + 2 + 6 = 9
2) [FE-JEE-DAI] - 1+3+6 = 102)[FE-JEE-DAI]-1 + 3 + 6 = 10
3) [FE-JE-AI] - 1+3+4 = 83)[FE-JE-AI]-1 + 3 + 4 = 8
OPTIMAL COMBINATION - 2) [FE-JEE-DAI] scores 10 最佳组合-2)[FE-JEE-DAI]得分10
I think it should go something like this: 我认为应该这样:
1) check if string contain particular substring: 1)检查字符串是否包含特定的子字符串:
var string = "FEEJEEDAI", substring = "JE"; string.indexOf(substring) !== -1;
2) If true find it's index 2)如果为真找到它的索引
var subStringIndex = string.indexOf(substring)
3) Create new tempString to build combination and 'cut off' substring
from string
3)创建新的tempString来构建组合并从
string
“切断” substring
string
var tempString = string.slice(subStringIndex, substring.length)
4) Iterate through string
and find optimal tempString
4)遍历
string
并找到最佳的tempString
I don't know how to build it into loop and and handle situation JE vs JEE, AI vs DAI 我不知道如何将其构建为循环并处理情况JE,JEE,AI与DAI
Basically, you could use an iterative and recursive approach for getting all possible substrings of the string. 基本上,您可以使用迭代和递归的方法来获取字符串的所有可能的子字符串。
This solution is splitted into 3 parts 此解决方案分为3部分
At start, all substrings of the string are collected in the indices
object. 开始时,字符串的所有子字符串都收集在
indices
对象中。 The key is the index and the value is an object with a limit, which is the smallest length of the strings in the pattern array. 关键是索引,值是有限制的对象,该对象是模式数组中字符串的最小长度。 The pattern array contains the index and the found substrings beginning at that index.
模式数组包含索引和从该索引开始的找到的子字符串。
indices
object from the first example第一个示例中的
indices
对象{ 0: { limit: 2, pattern: [ { index: 0, string: "FE" } ] }, 3: { limit: 2, pattern: [ { index: 3, string: "JE" }, { index: 3, string: "JEE" } ] }, /* ... */ }
The main idea is to start at index zero with an empty array for collecting substrings. 主要思想是从索引零开始,带有一个用于收集子字符串的空数组。
To check, which parts are together in a group, you need to get the first substring at a given index or the next close one, then take the limit property, which is the length of the shortest substring, add the index and take it as the maximum index for searching group members. 要检查一组中哪些部分在一起,您需要获取给定索引的第一个子字符串或下一个接近的子字符串,然后采用limit属性(即最短子字符串的长度),添加索引并将其取为搜索组成员的最大索引。
From the second example the first group consists of
'FE'
,'EE'
and'EEJ'
在第二个示例中,第一组由
'FE'
,'EE'
和'EEJ'
string comment ---------- ------------------------------------- 01 2345678 indices FE|EJEEDAI FE| matching pattern FE at position 0 E|E matching pattern EE at position 1 E|EJ matching pattern EEJ at position 1 ^^ all starting substrings are in the same group
With that group, a new recursion is invoked, with an adjusted index and with the substring concatinated to the parts array. 在该组中,将调用新的递归,具有调整后的索引,并且子字符串隐含在parts数组中。
If no more substring are found, the parts are joined and the score is calculated and pushed to the result set. 如果找不到更多的子字符串,则将零件合并,计算分数并将其推入结果集。
Interpreting the result
解释结果
[ { parts: "0|FE|3|JE|6|DAI", score: 9 }, /* ... */ ]
parts
are a combination of indices and matching strings at the positionparts
是位置处的索引和匹配字符串的组合0|FE|3|JE|6|DAI ^ ^^ at index 0 found FE ^ ^^ at index 3 found JE ^ ^^^ at index 6 found DAI
score
is calculated with the given weights of the substrings用给定的子字符串权重计算
score
substring weight --------- ------ FE 1 JE 2 DAI 6 --------- ------ score 9
The example three returns 11 unique combinations. 示例三返回11个唯一组合。
function getParts(string, weights) { function collectParts(index, parts) { var group, limit; while (index < string.length && !indices[index]) { index++; } if (indices[index]) { group = indices[index].pattern; limit = index + indices[index].limit; while (++index < limit) { if (indices[index]) { group = group.concat(indices[index].pattern); } } group.forEach(function (o) { collectParts(o.index + o.string.length, parts.concat(o.index, o.string)); }); return; } result.push({ parts: parts.join('|'), score: parts.reduce(function (score, part) { return score + (weights[part] || 0); }, 0) }); } var indices = {}, pattern, result = []; Object.keys(weights).forEach(function (k) { var p = string.indexOf(k); while (p !== -1) { pattern = { index: p, string: k }; if (indices[p]) { indices[p].pattern.push(pattern); if (indices[p].limit > k.length) { indices[p].limit = k.length; } } else { indices[p] = { limit: k.length, pattern: [pattern] }; } p = string.indexOf(k, p + 1); } }); collectParts(0, []); return result; } console.log(getParts("FEEJEEDAI", { FE: 1, JE: 2, JEE: 3, AI: 4, DAI: 6 })); console.log(getParts("FEEJEEDAI", { FE: 1, JE: 2, JEE: 3, AI: 4, DAI: 6, EEJ: 5, EJE: 3, EE: 1 })); console.log(getParts("EEEEEE", { EE: 2, EEE: 3 }));
.as-console-wrapper { max-height: 100% !important; top: 0; }
If you're slicing out the substring when you find them, since certain substrings are substrings of other substrings, search for the biggest ones first. 如果要在找到子字符串时将其切出,由于某些子字符串是其他子字符串的子字符串,请首先搜索最大的子字符串。 For example, if you didn't find DAI, and you find AI, it can't be a part of DAI.
例如,如果您没有找到DAI,而您找到了AI,那么它就不能成为DAI的一部分。 You want to test for each substring, so you can put each substring into an array and loop through the array.
您要测试每个子字符串,因此可以将每个子字符串放入数组中并遍历该数组。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.