简体   繁体   English

查找给定字符串中的最佳子字符串集

[英]Find optimal set of substrings in given string

I'm trying to find optimal set of strings for given string. 我试图找到给定字符串的最佳字符串集。

Given string: "FEEJEEDAI" 给定的字符串:“ FEEJEEDAI”

Substrings values: 子字符串值:

FE - 1 FE-1
JE - 2 JE-2
JEE - 3 杰-3
AI - 4 AI-4
DAI - 6 DAI-6

Possible combinations: 可能的组合:

1) [FE-JE-DAI] - 1+2+6 = 9 1)[FE-JE-DAI]-1 + 2 + 6 = 9
2) [FE-JEE-DAI] - 1+3+6 = 10 2)[FE-JEE-DAI]-1 + 3 + 6 = 10
3) [FE-JE-AI] - 1+3+4 = 8 3)[FE-JE-AI]-1 + 3 + 4 = 8

OPTIMAL COMBINATION - 2) [FE-JEE-DAI] scores 10 最佳组合-2)[FE-JEE-DAI]得分10

I think it should go something like this: 我认为应该这样:

1) check if string contain particular substring: 1)检查字符串是否包含特定的子字符串:

var string = "FEEJEEDAI", substring = "JE"; string.indexOf(substring) !== -1;

2) If true find it's index 2)如果为真找到它的索引

var subStringIndex = string.indexOf(substring)

3) Create new tempString to build combination and 'cut off' substring from string 3)创建新的tempString来构建组合并从string “切断” substring string

var tempString = string.slice(subStringIndex, substring.length)

4) Iterate through string and find optimal tempString 4)遍历string并找到最佳的tempString

I don't know how to build it into loop and and handle situation JE vs JEE, AI vs DAI 我不知道如何将其构建为循环并处理情况JE,JEE,AI与DAI

Basically, you could use an iterative and recursive approach for getting all possible substrings of the string. 基本上,您可以使用迭代和递归的方法来获取字符串的所有可能的子字符串。

This solution is splitted into 3 parts 此解决方案分为3部分

  1. Preparation 制备
  2. Collecting parts 收集零件
  3. Calculating score and create result set 计算分数并创建结果集

Preparation 制备

At start, all substrings of the string are collected in the indices object. 开始时,字符串的所有子字符串都收集在indices对象中。 The key is the index and the value is an object with a limit, which is the smallest length of the strings in the pattern array. 关键是索引,值是有限制的对象,该对象是模式数组中字符串的最小长度。 The pattern array contains the index and the found substrings beginning at that index. 模式数组包含索引和从该索引开始的找到的子字符串。

indices object from the first example 第一个示例中的indices对象

 { 0: { limit: 2, pattern: [ { index: 0, string: "FE" } ] }, 3: { limit: 2, pattern: [ { index: 3, string: "JE" }, { index: 3, string: "JEE" } ] }, /* ... */ } 

Collecting parts 收集零件

The main idea is to start at index zero with an empty array for collecting substrings. 主要思想是从索引零开始,带有一个用于收集子字符串的空数组。

To check, which parts are together in a group, you need to get the first substring at a given index or the next close one, then take the limit property, which is the length of the shortest substring, add the index and take it as the maximum index for searching group members. 要检查一组中哪些部分在一起,您需要获取给定索引的第一个子字符串或下一个接近的子字符串,然后采用limit属性(即最短子字符串的长度),添加索引并将其取为搜索组成员的最大索引。

From the second example the first group consists of 'FE' , 'EE' and 'EEJ' 在第二个示例中,第一组由'FE''EE''EEJ'

 string comment ---------- ------------------------------------- 01 2345678 indices FE|EJEEDAI FE| matching pattern FE at position 0 E|E matching pattern EE at position 1 E|EJ matching pattern EEJ at position 1 ^^ all starting substrings are in the same group 

With that group, a new recursion is invoked, with an adjusted index and with the substring concatinated to the parts array. 在该组中,将调用新的递归,具有调整后的索引,并且子字符串隐含在parts数组中。

Calculating score and create result set 计算分数并创建结果集

If no more substring are found, the parts are joined and the score is calculated and pushed to the result set. 如果找不到更多的子字符串,则将零件合并,计算分数并将其推入结果集。

Interpreting the result 解释结果

  [ { parts: "0|FE|3|JE|6|DAI", score: 9 }, /* ... */ ] 

parts are a combination of indices and matching strings at the position parts是位置处的索引和匹配字符串的组合

  0|FE|3|JE|6|DAI ^ ^^ at index 0 found FE ^ ^^ at index 3 found JE ^ ^^^ at index 6 found DAI 

score is calculated with the given weights of the substrings 用给定的子字符串权重计算score

 substring weight --------- ------ FE 1 JE 2 DAI 6 --------- ------ score 9 

The example three returns 11 unique combinations. 示例三返回11个唯一组合。

 function getParts(string, weights) { function collectParts(index, parts) { var group, limit; while (index < string.length && !indices[index]) { index++; } if (indices[index]) { group = indices[index].pattern; limit = index + indices[index].limit; while (++index < limit) { if (indices[index]) { group = group.concat(indices[index].pattern); } } group.forEach(function (o) { collectParts(o.index + o.string.length, parts.concat(o.index, o.string)); }); return; } result.push({ parts: parts.join('|'), score: parts.reduce(function (score, part) { return score + (weights[part] || 0); }, 0) }); } var indices = {}, pattern, result = []; Object.keys(weights).forEach(function (k) { var p = string.indexOf(k); while (p !== -1) { pattern = { index: p, string: k }; if (indices[p]) { indices[p].pattern.push(pattern); if (indices[p].limit > k.length) { indices[p].limit = k.length; } } else { indices[p] = { limit: k.length, pattern: [pattern] }; } p = string.indexOf(k, p + 1); } }); collectParts(0, []); return result; } console.log(getParts("FEEJEEDAI", { FE: 1, JE: 2, JEE: 3, AI: 4, DAI: 6 })); console.log(getParts("FEEJEEDAI", { FE: 1, JE: 2, JEE: 3, AI: 4, DAI: 6, EEJ: 5, EJE: 3, EE: 1 })); console.log(getParts("EEEEEE", { EE: 2, EEE: 3 })); 
 .as-console-wrapper { max-height: 100% !important; top: 0; } 

If you're slicing out the substring when you find them, since certain substrings are substrings of other substrings, search for the biggest ones first. 如果要在找到子字符串时将其切出,由于某些子字符串是其他子字符串的子字符串,请首先搜索最大的子字符串。 For example, if you didn't find DAI, and you find AI, it can't be a part of DAI. 例如,如果您没有找到DAI,而您找到了AI,那么它就不能成为DAI的一部分。 You want to test for each substring, so you can put each substring into an array and loop through the array. 您要测试每个子字符串,因此可以将每个子字符串放入数组中并遍历该数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM