简体   繁体   English

将数组大小减小到 1 的最小成本

[英]Minimum Cost to reduce the size of array to 1

Given an array of N numbers (not necessarily sorted).给定一个包含 N 个数字的数组(不一定已排序)。 We can merge any two numbers into one and the cost of merging the two numbers is equal to the sum of the two values.我们可以将任意两个数字合并为一个,合并这两个数字的成本等于两个值的总和。 The task is to find the total minimum cost of merging all the numbers.任务是找到合并所有数字的最小总成本。

Example:例子:
Let the array A = [1,2,3,4]让数组 A = [1,2,3,4]

Then, we can remove 1 and 2, add both of them and keep the sum back in array.然后,我们可以删除 1 和 2,将它们相加并将总和保留在数组中。 Cost of this step would be (1+2) = 3.此步骤的成本为 (1+2) = 3。

Now, A = [3,3,4], Cost = 3现在,A = [3,3,4],成本 = 3

In second step, we can 3 and 3, add both of them and keep the sum back in array.在第二步中,我们可以将 3 和 3 相加,并将总和保留在数组中。 Cost of this step would be (3+3) = 6.此步骤的成本为 (3+3) = 6。

Now, A = [4,6], Cost = 6现在,A = [4,6],成本 = 6

In third step, we can remove both elements from the array and keep the sum back in array again.在第三步中,我们可以从数组中删除两个元素并将总和再次保留在数组中。 Cost of this step would be (4+6) = 6.此步骤的成本为 (4+6) = 6。

Now, A = [10], Cost = 10现在,A = [10],成本 = 10

So, total cost turns out to be 19 (10+6+3).因此,总成本结果为 19 (10+6+3)。

We will have to pick the 2 smallest elements to minimize our total cost.我们将不得不选择 2 个最小的元素来最小化我们的总成本。 A simple way to do this is using a min heap structure.一个简单的方法是使用最小堆结构。 We will be able to get the minimum element in O(1) and insertion will be O(log n).我们将能够在 O(1) 中获得最小元素,并且插入将是 O(log n)。

The time complexity of this approach is O(n log n).这种方法的时间复杂度是 O(n log n)。

But I tried another approach, and wasn't able to find the cases where it fails.但是我尝试了另一种方法,但无法找到失败的情况。 The basic idea was that the sum of two smallest elements that we will choose at any time will always be greater than the sum of the pair of elements chosen before.基本思想是,我们将在任何时候选择的两个最小元素的总和总是大于之前选择的一对元素的总和。 So the "temp" array will always be sorted, and we will be able to access the minimum elements in O(1).因此“temp”数组将始终被排序,我们将能够访问 O(1) 中的最小元素。

As I am sorting the input array and then simply traversing the array, the complexity of my approach is O(n log n).当我对输入数组进行排序然后简单地遍历数组时,我的方法的复杂性是 O(n log n)。

int minCost(vector<int>& arr) {
    sort(arr.begin(), arr.end());
    // temp array will contain the sum of all the pairs of minimum elements
    vector<int> temp;
    // index for arr
    int i = 0;
    // index for temp
    int j = 0;
    int cost = 0;

    // while we have more than 1 element combined in both the input and temp array
    while(arr.size() - i + temp.size() - j > 1) {
        int num1, num2;
        // selecting num1 (minimum element)
        if(i < arr.size() && j < temp.size()) {
            if(arr[i] <= temp[j])
                num1 = arr[i++];
            else
                num1 = temp[j++];
        }
        else if(i < arr.size())
            num1 = arr[i++];
        else if(j < temp.size())
            num1 = temp[j++];

        // selecting num2 (second minimum element)
        if(i < arr.size() && j < temp.size()) {
            if(arr[i] <= temp[j])
                num2 = arr[i++];
            else
                num2 = temp[j++];
        }
        else if(i < arr.size())
            num2 = arr[i++];
        else if(j < temp.size())
            num2 = temp[j++];

        // appending the sum of the minimum elements in the temp array
        int sum = num1 + num2;
        temp.push_back(sum);
        cost += sum;
    }
    return cost;
}

Is this approach correct?这种方法正确吗? If not, please let me know what I am missing, and the test cases in which this algorithm fails.如果没有,请让我知道我缺少什么,以及该算法失败的测试用例。

SPOJ Link for the same problem相同问题的 SPOJ Link

The logic seems very solid to me... all the computed sums will never be decreasing and therefore you only need to add up either oldest two computed sums, next two elements or oldest sum and next element.逻辑对我来说似乎非常可靠......所有计算的总和永远不会减少,因此您只需将最旧的两个计算和,接下来的两个元素或最旧的总和和下一个元素相加。

I would just simplify the code:我只是简化代码:

#include <vector>
#include <algorithm>
#include <stdio.h>

int hsum(std::vector<int> arr) {
    int ni = arr.size(), nj = 0, i = 0, j = 0, res = 0;
    std::sort(arr.begin(), arr.end());
    std::vector<int> temp;
    auto get = [&]()->int {
        if (j == nj || (i < ni && arr[i] < temp[j])) return arr[i++];
        return temp[j++];
    };
    while ((ni-i)+(nj-j)>1) {
        int a = get(), b = get();
        res += a+b;
        temp.push_back(a + b); nj++;
    }
    return res;
}

int main() {
    fprintf(stderr, "%i\n", hsum(std::vector<int>{1,4,2,3}));
    return  0;
}

Very nice idea!非常好的主意!

Another improvement is noting that the cumulative length of the two arrays being processed (the original one and the temporary one holding the sums) will decrease at every step.另一项改进是注意到正在处理的两个数组(原始数组和保存总和的临时数组)的累积长度将在每一步都减少。 Since the first step will use two input elements, the fact that the temporary array grows one element at each step will still not be enough for a "walking queue" allocated in the array itself to reach the reading pointer.由于第一步将使用两个输入元素,临时数组在每一步增加一个元素的事实仍然不足以使数组本身中分配的“步行队列”到达读取指针。 This means that there is no need of a temporary array and the space for the sums can be found in the array itself...这意味着不需要临时数组,并且可以在数组本身中找到总和的空间......

int hsum(std::vector<int> arr) {
    int ni = arr.size(), nj = 0, i = 0, j = 0, res = 0;
    std::sort(arr.begin(), arr.end());
    auto get = [&]()->int {
        if (j == nj || (i < ni && arr[i] < arr[j])) return arr[i++];
        return arr[j++];
    };
    while ((ni-i)+(nj-j)>1) {
        int a = get(), b = get();
        res += a+b;
        arr[nj++] = a + b;
    }
    return res;
}

About the error on SPOJ... I tried briefly to search for the problem but I didn't succeed.关于 SPOJ 上的错误...我试图简单地搜索问题,但没有成功。 I tried however generating random arrays of random lengths and checking this solution with what finds a "brute-force" one implemented directly from the specs and I'm reasonably confident that the algorithm is correct.但是,我尝试生成随机长度的随机数组,并使用直接从规范实现的“蛮力”解决方案来检查该解决方案,并且我有理由相信该算法是正确的。

I know at least one programming arena (Topcoder) where sometimes the problems are carefully crafted so that the computation gives correct results if using unsigned but not if using int (or if using unsigned long long but not if using long long ) because of integer overflow.我知道至少一个编程领域(Topcoder),有时问题是精心设计的,因此如果使用unsigned但如果使用int (或者如果使用unsigned long long但如果使用long long则不会),由于整数溢出,计算会给出正确的结果. I don't know if SPOJ also does this kind of nonsense (1) ... may be that is the reason some hidden test case fails...不知道SPOJ是不是也做这种废话(1) ……可能这就是某些隐藏测试用例失败的原因……

EDIT编辑

Checking with SPOJ the algorithm passes if using long long values... this is the entry I used:如果使用long long值,则使用 SPOJ 检查算法通过...这是我使用的条目:

#include <stdio.h>
#include <algorithm>
#include <vector>

int main(int argc, const char *argv[]) {
    int n;
    scanf("%i", &n);
    for (int testcase=0; testcase<n; testcase++) {
        int sz; scanf("%i", &sz);
        std::vector<long long> arr(sz);
        for (int i=0; i<sz; i++) scanf("%lli", &arr[i]);

        int ni = arr.size(), nj = 0, i = 0, j = 0;
        long long res = 0;
        std::sort(arr.begin(), arr.end());
        auto get = [&]() -> long long {
            if (j == nj || (i < ni && arr[i] < arr[j])) return arr[i++];
            return arr[j++];
        };
        while ((ni-i)+(nj-j)>1) {
            long long a = get(), b = get();
            res += a+b;
            arr[nj++] = a + b;
        }
        printf("%lli\n", res);
    }
    return 0;
}

PS: This very kind of computation is also what is needed to build an Huffman tree for entropy coding given the symbols frequency table and thus it's not a mere random exercise but it has practical applications. PS:在给定符号频率表的情况下,这种计算也是构建用于熵编码的霍夫曼树所需要的,因此它不仅仅是随机练习,而是具有实际应用。


(1) I'm saying "nonsense" because in Topcoder they never give problems that require 65 bits; (1) 我说的是“废话”,因为在 Topcoder 中他们从不给出需要 65 位的问题; thus it's not a genuine care about overflows, but just setting traps for novices.因此,它不是真正关心溢出,而只是为新手设置陷阱。 Another that I think is a bad practice I saw on TC is that some problems are carefully designed so that the correct algorithm if using C++ will barely fit in the timeout limit: just use another language (and get eg a 2× slowdown) and you cannot solve the problem.另一个我认为是我在 TC 上看到的不好的做法是,一些问题是经过精心设计的,因此如果使用 C++ ,正确的算法将几乎不符合超时限制:只需使用另一种语言(并获得例如 2 倍的减速),你不能解决问题。

First of all, think simple!首先,想的简单!
When using a priority queue , the problem is easy!使用priority queue时,问题很简单!
In the first test case :在第一个测试用例中:

1 6 3 20
// after pushing to Q
1 3 6 20
// and sum two top items and pop and push!
(1 + 3) 6 20    cost = 4
(4 + 6) 20      cost = 10 + 4 
(10 + 20)       cost = 30 + 14
30              cost = 44
#include<iostream>
#include<queue>
using namespace std;


int main()
{
    int t;
    cin >> t;
    while (t--) {
        int n;
        cin >> n;
        priority_queue<long long int, vector<long long int>, greater<long long int>> q;

        for (int i = 0; i < n; ++i) {
            int k;
            cin >> k;
            q.push(k);
        }

        long long int sum = 0;
        while (q.size() > 1) {
            long long int a = q.top();
            q.pop();
            long long int b = q.top();
            q.pop();
            q.push(a + b);
            sum += a + b;
        }

        cout << sum << "\n";
    }
}

Basically we need to sort the list in desc order and then find its cost like this.基本上我们需要按 desc 顺序对列表进行排序,然后像这样找到它的成本。

A.sort(reverse=True)
cost = 0
for i in range(len(A)):
    cost += A[i] * (i+1)
return cost

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM