简体   繁体   English

给定操作成本的用于构造字符串的优化算法

[英]Optimizing algorithm for constructing a string given costs to operations

I'm doing the following problem (not homework): I'm doing an exercise (not homework) and I decided to go with backtracking, The problem says as follows:我正在做以下问题(不是作业):我在做练习(不是作业),我决定回溯,问题如下:

You are given as input a target string.您将获得一个目标字符串作为输入。 Starting with an empty string, you add characters to it, until your new string is same as the target.从一个空字符串开始,向其中添加字符,直到新字符串与目标相同。 You have two options to add characters to a string: You can append an arbitrary character to your new string, with cost x You can clone any substring of your new string so far, and append it to the end of your new string, with cost y For a given target, append cost x, and clone cost y, we want to know what the cheapest cost is of building the target string您有两种向字符串添加字符的选项: 您可以将任意字符附加到新字符串,成本为 x 到目前为止,您可以克隆新字符串的任何子字符串,并将其附加到新字符串的末尾,成本为y 对于给定的目标,附加成本 x 和克隆成本 y,我们想知道构建目标字符串的最便宜的成本是多少

And some examples:还有一些例子:

Target "aa", append cost 1, clone cost 2: the cheapest cost is 2:目标“aa”,追加成本1,克隆成本2:最便宜的成本是2:

Start with an empty string, ""
Append 'a' (cost 1), giving the string "a"
Append 'a' (cost 1), giving the string "aa"

Target "aaaa", append cost 2, clone cost 3: the cheapest cost is 7:目标“aaaa”,追加成本2,克隆成本3:最便宜的成本是7:

Start with an empty string, ""
Append 'a' (cost 2), giving the string "a"
Append 'a' (cost 2), giving the string "aa"
Clone "aa" (cost 3), giving the string "aaaa"

Target "xzxpzxzxpq", append cost 10, clone cost 11: the cheapest cost is 71:目标“xzxpzxzxpq”,追加成本10,克隆成本11:最便宜的成本是71:

Start with an empty string, ""
Append 'x' (cost 10): "x"
Append 'z' (cost 10): "xz"
Append 'x' (cost 10): "xzx"
Append 'p' (cost 10): "xzxp"
Append 'z' (cost 10): "xzxpz"
Clone "xzxp" (cost 11): "xzxpzxzxp"
Append 'q' (cost 10) : "xzxpzxzxpq"

So far so good.到现在为止还挺好。 I first tried to do it with backtracking, but then the following test case came:我首先尝试通过回溯来做到这一点,但随后出现了以下测试用例:

string bigString = "abcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjoirmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaip";
string doubleIt = bigString + bigString;

Now that's big.现在大了。 Given costs of 1234 , 1235 to append and clone respectivly, the total cost of building it is 59249 .给定12341235分别附加和克隆的成本,构建它的总成本是59249 So no more backtracking for this one because of the stack overflow.所以不再因为堆栈溢出而回溯这一点。 I tried a more efficient approach:我尝试了一种更有效的方法:

#include <iostream>
#include <vector>
#include <string>
#include <set>

int isWorthClone(const int size, const std::string& target) {
    int worth = 0;
    for (int j = size; j < target.size() and worth < size; j++) {
        if (target[j] == target[worth]) {
            worth++;
        }
        else break;
    }
    return worth;
}

int buildSolution(const std::string& target, int cpyCst, int apndCst) {
    int index = 0;
    int cost = 0;
    while (int(target.size()) != (index)) {
        int hasta = isWorthClone(index, target);
        if (cpyCst < hasta * apndCst) {
            cost += cpyCst;
            index += hasta ;
        }
        else {
            cost += apndCst;
            index++;
        }
    }
    return cost;
}


int main() {

    std::string bigString = "abcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjoirmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaip";
    std::string doubleIt = bigString + bigString;
    std::string target = bigString;
    int copyCost = 1235;
    int appendCost = 1234;
    std::cout << buildSolution(target, copyCost, appendCost) << std::endl;
}

but the output is 3588498 , and from the test case, the correct output should be 59249 .但是输出是3588498 ,从测试用例3588498 ,正确的输出应该是59249 I can't find why this approach is giving me that result.我不明白为什么这种方法会给我这样的结果。 I tried debugging it, and it seems like isWorthClone is not finding the right position to clone in some cases.我尝试调试它,似乎isWorthClone在某些情况下没有找到正确的克隆位置。 Also it seems a little strange, because it works for the other cases, but as this is somewhat "clone expensive" I think is propagating the error.这似乎也有点奇怪,因为它适用于其他情况,但由于这有点“克隆昂贵”,我认为这是传播错误。

Any clues on why is this happening?关于为什么会发生这种情况的任何线索? This is O(n^2), so I think this should be the optimal solution.这是 O(n^2),所以我认为这应该是最佳解决方案。

Edit:编辑:

My code now looks like the following, trying to follow the dp approach:我的代码现在如下所示,尝试遵循dp方法:

int canCopy(const int i, const string& target, int posCopied) {
    int iStartArray = 0;
    bool canCopy = true;
    int aux = i;
    while (canCopy) {
        if (aux - 1 + posCopied > target.size() or target[iStartArray] != target[aux - 1]) {
            canCopy = false;
        }
        else {
            posCopied += 1;
            iStartArray++;
            aux++;
        }
    }
    return posCopied;
}

int stringConstruction(string target, int copyCost, int appendCost) {
    vector<int> dp(target.size() + 1, std::numeric_limits<int>::max());

    dp[1] = appendCost;
    for (int i = 2; i < dp.size(); i++) {
        dp[i] = std::min(dp[i], dp[i - 1] + appendCost);
        int posCopied = canCopy(i, target, 0);
        if (posCopied != 0 and (posCopied + i) < dp.size()) {
            dp[posCopied + i] = dp[i] + copyCost;
        }
    }
    return dp[dp.size()-1];
}

This still doesn't work for the test case presented here.这仍然不适用于此处提供的测试用例。

Edit2: Finally I implemented the solution provided by @David Eisenstat (thanks!), with a really naive approach: Edit2:最后我用一种非常天真的方法实现了@David Eisenstat 提供的解决方案(谢谢!):

int best_clone(const string& s) {
    int j = s.size() - 1;
    while (s.substr(0, j).find(s.substr(j, s.size() - j)) != std::string::npos) {
        j--;
    }
    return j + 1;
}

int stringConstruction(string target, int copyCost, int appendCost) {
    vector<int> v = vector<int> (1, 0);
    for (int i = 0; i < target.size(); i++) {
        int cost = v[i] + appendCost;
        int j = best_clone(target.substr(0, i+1));
        if (j <= i) {
            cost = std::min(cost, v[j] + copyCost);
        }
        v.push_back(cost);
    }
    return v[v.size() - 1];

}

It seems like I missunderstood the problem.好像我误解了这个问题。 This is giving the solution for the test cases, but it takes too long.这为测试用例提供了解决方案,但需要的时间太长。 best_clone needs to be optimized. best_clone需要优化。

Edit 3: (Hope this is the last one)编辑3:(希望这是最后一个)

I added the following class SA for storing the suffix array:我添加了以下类SA来存储后缀数组:

#pragma once
#include <vector>
#include <string>
#include <algorithm>
#include <iostream>
#include <chrono>
using namespace std;

typedef struct {
    int index;
    string s;
} suffix;

struct comp
{
    inline bool operator() (const suffix& s1, const suffix& s2)
    {
        return (s1.s < s2.s);
    }
};

class SA
{
private:
    vector<suffix> values;
public:
    SA(const string& s) : values(s.size()) {
        string aux = s;
        for (int i = 0; i < s.length(); i++) {
            values[i].index = i;
            values[i].s = s.substr(i, s.size() - i);;
        }
        sort(values.begin(), values.end(), comp());
    }

    friend ostream& operator<<(ostream& os, const SA& dt)
    {
        for (int i = 0; i < dt.values.size(); i++) {
            os << dt.values[i].index << ": " << dt.values[i].s << "\n";
        }
        return os;
    }

    int search(const string& subst, int i, int j) {
        while (j >= i) {
            int mid = (i + j) / 2;
            if (this->values[mid].s > subst) {
                j = mid-1;
            }
            else if (this->values[mid].s < subst) {
                i = mid+1;
            }
            else return mid;
        }
        return -1;
    }

};

But know I don't know how to search here for the best clone in this array.但是知道我不知道如何在这里搜索此数组中的最佳clone (I know this is slow, n*2log(n) I would say, but I think is going to be good enough for this one. So now I need to put together these parts. (我知道这很慢,我会说 n*2log(n),但我认为对于这个已经足够了。所以现在我需要把这些部分放在一起。

The problem is that you're making the decision to clone greedily.问题是您正在做出贪婪的克隆决定。 Let's look at a case where the append cost is 2 and the clone cost is 3. If you process the string aabaaaba , you'll append aab , clone aa , and clone aba , whereas the best solution is to append aaba and clone it.让我们看一个附加成本为 2 且克隆成本为 3 的情况。如果您处理字符串aabaaaba ,您将附加aab 、克隆aa和克隆aba ,而最好的解决方案是附加aaba并克隆它。

The fix is dynamic programming, specifically, to build an array of the cost to make each prefix of the target string.解决方法是动态编程,具体来说,就是构建一个代价数组来制作目标字符串的每个前缀。 To fill each entry, take the min of (append cost plus previous entry, clone cost plus cost for the shortest prefix that can be completed with one clone).要填充每个条目,取最小值(附加成本加上前一个条目,克隆成本加上可以用一个克隆完成的最短前缀的成本)。 Since the clone cost is constant, the array is nondecreasing, and therefore we don't need to check all of the possible prefixes.由于克隆成本是恒定的,数组是非递减的,因此我们不需要检查所有可能的前缀。

Depending on the constraints you may need to construct a suffix array/longest common prefix array (using eg, SA-IS) to identify all of the best clones quickly.根据限制,您可能需要构建一个后缀数组/最长公共前缀数组(使用例如 SA-IS)以快速识别所有最佳克隆。 This will run in time o(n²) for sure (quite possibly O(n), but there are enough moving parts that I don't want to claim that).这肯定会在时间 o(n²) 中运行(很可能是 O(n),但有足够多的移动部件,我不想声称)。

This Python is too slow but gets the right answer on the large test case:这个 Python 太慢了,但在大型测试用例上得到了正确的答案:

def best_clone(s):
    j = len(s) - 1
    while s[j:] in s[:j]:
        j -= 1
    return j + 1


def construction_cost(s, append_cost, clone_cost):
    table = [0]
    for i in range(len(s)):
        cost = table[i] + append_cost
        j = best_clone(s[: i + 1])
        if j <= i:
            cost = min(cost, table[j] + clone_cost)
        table.append(cost)
    return table[len(s)]

If the limit of your ambitions is quadratic, then we can put the Z function for string matching to good use.如果您的野心极限是二次的,那么我们可以充分利用用于字符串匹配的 Z 函数。

#include <algorithm>
#include <cstddef>
#include <iostream>
#include <string>
#include <string_view>
#include <vector>

using Cost = unsigned long long;

// Adapted from https://cp-algorithms.com/string/z-function.html
std::vector<std::size_t> ZFunction(std::string_view s) {
  std::size_t n = s.length();
  std::vector<std::size_t> z(n);
  for (std::size_t i = 1, l = 0, r = 0; i < n; i++) {
    if (i <= r) {
      z[i] = std::min(r - i + 1, z[i - l]);
    }
    while (i + z[i] < n && s[z[i]] == s[i + z[i]]) {
      z[i]++;
    }
    if (i + z[i] - 1 > r) {
      l = i;
      r = i + z[i] - 1;
    }
  }
  return z;
}

std::size_t BestClone(std::string_view s) {
  std::string r{s};
  std::reverse(r.begin(), r.end());
  auto z = ZFunction(r);
  std::size_t best = 0;
  for (std::size_t i = 0; i < z.size(); i++) {
    best = std::max(best, std::min(z[i], i));
  }
  return s.length() - best;
}

Cost ConstructionCost(std::string_view s, Cost append_cost, Cost clone_cost) {
  std::vector<Cost> costs = {0};
  for (std::size_t j = 0; j < s.length(); j++) {
    std::size_t i = BestClone(s.substr(0, j + 1));
    if (i <= j) {
      costs.push_back(
          std::min(costs.back() + append_cost, costs[i] + clone_cost));
    } else {
      costs.push_back(costs.back() + append_cost);
    }
  }
  return costs.back();
}

int main() {
  std::string s;
  while (std::cin >> s) {
    std::cout << ConstructionCost(s, 1234, 1235) << '\n';
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM