简体   繁体   English

C ++ - 将项添加到排序数组的最快方法

[英]C++ - Fastest way to add an item to a sorted array

I've got a database with approximately 200 000 items, which is sorted by username. 我有一个大约有20万个项目的数据库,按用户名排序。 Now when I add an item to end of array and call my quick sort function to sort that array it takes almost a second to sort, which is not acceptable. 现在,当我将一个项添加到数组的末尾并调用我的快速排序函数来对该数组进行排序时,几乎需要一秒的时间进行排序,这是不可接受的。 There are definitely quite some optimisations that can be done. 肯定有一些可以做的优化。 For example if I sequentially compare each string from n-1 to 0, and then move items accordingly performance is much greater. 例如,如果我按顺序将每个字符串从n-1比较为0,然后相应地移动项目,则性能会更高。

Other idea is that I could perform binary search from 0 to n-1, well not infact search, but something similar to take advantage of my already sorted array. 其他的想法是我可以执行从0到n-1的二进制搜索,而不是infact搜索,但类似于利用我已经排序的数组。 However I've failed to write a proper function that would return an index where my new element should be placed. 但是我没能编写一个正确的函数来返回一个索引,我的新元素应该放在该函数中。

void quick_sort(int left, int right)
{
    int i = left, j = right;
    if (left >= right) return;
    char  pivotC[128];
    DataEntry *tmp;

    strcpy_a(pivotC, sizeof pivotC, User[(left + right) / 2]->username);

    while (i <= j)
    {
        while (StringCompare(User[i]->username, pivotC))
            i++;
        while (StringCompare(pivotC, User[j]->username))
            j--;
        if (i <= j) 
        {
            tmp = User[i];
            User[i] = User[j];
            User[j] = tmp;
            i++;
            j--;
        }
    }
    if (left < j)
        quick_sort(left, j);
    if (i < right)
        quick_sort(i, right);
}

Any help is greatly appreciated. 任何帮助是极大的赞赏。

the solution is to rewrite your code to use the stl, I don't understand why people write C code in C++. 解决方案是重写你的代码以使用stl,我不明白为什么人们用C ++编写C代码。

You need a vector of User 你需要一个用户矢量

std::vector<User> users;
//then you can keep it ordered at each insertion
auto it = upper_bound(users.begin(), users.end(), user_to_insert, 
    [](auto& lhs, auto& rhs ) { /* implementation left to the reader */});
users.insert(it, user_to_insert);

You now have the same functionality in a much nicer and clean way 您现在可以以更好,更干净的方式使用相同的功能

Easy , direct method cause binary searching is too mainstream. 简单,直接的方法导致二进制搜索太主流了。 Just need a few lines: 只需几行:

int where_to_add(int array[], int element)
{
    int i;
    for (i = length; i >= 0 && array[i-1] > element; i--);
    return i;
}

Let me know if this is the answer you were looking for 如果这是您正在寻找的答案,请告诉我

Reinventing the wheel is fine if you want to learn how to code binary search, otherwise reusing is better. 如果你想学习如何编码二进制搜索,重新发明轮子是好的,否则重用是更好的。

std::lower_bound performs a binary search on a sorted range [first, last) , returning an iterator to the searched element x if already present; std::lower_bound在排序范围[first, last)上执行二进制搜索,如果已存在,则将迭代器返回到搜索元素x ; otherwise the iterator would be pointing to the first element greater than x . 否则迭代器将指向大于x的第一个元素。 Since standard containers' exposing an insert would insert before the iterator, this iterator can be used as-is. 由于暴露insert标准容器将在迭代器之前插入,因此该迭代器可以原样使用。 Here's a simple example. 这是一个简单的例子。

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

int main()
{
    std::list<int> data = { 1, 5, 7, 8, 12, 34, 52 };

    auto loc = std::lower_bound(data.begin(), data.end(), 10);
    // you may insert 10 here using loc
    std::cout << *loc << '\n';

    loc = std::lower_bound(data.begin(), data.end(), 12);
    // you may skip inserting 12 since it is in the list (OR)
    // insert it if you need to; it'd go before the current 12
    std::cout << *loc << '\n';
}

12 12

12 12

Binary search will be of limited interest, as you will need to insert anyway and this will remain a time consuming operation (O(N)). 二进制搜索的兴趣有限,因为无论如何都需要插入,这将是一个耗时的操作(O(N))。 So your first idea of a linear search followed by insertion is good enough; 所以你第一次想到线性搜索然后插入就足够了; you can combine in a single backward loop. 你可以在一个向后循环中组合。 (This is a step of StraightInsertionSort.) (这是StraightInsertionSort的一个步骤。)

The truly efficient ways to handle dynamic sorted lists are by maintaining a balanced tree or using a hash table. 处理动态排序列表的真正有效方法是维护平衡树或使用哈希表。

You can do binary search like this way.. Here You can assume that if val is string type then compare using string comparison function and int AR[] is set of string or You can map them to integer. 您可以像这样进行二进制搜索。这里您可以假设如果val是字符串类型,则使用字符串比较函数进行比较,并且int AR []设置为字符串,或者您可以将它们映射到整数。 As the array is sorted , I think binary search will give you the best performance. 由于数组已排序,我认为二进制搜索将为您提供最佳性能。

int bsearch(int AR[], int N, int VAL)
{
    int Mid,Lbound=0,Ubound=N-1;

    while(Lbound<=Ubound)
    {
        Mid=(Lbound+Ubound)/2;
        if(VAL>AR[Mid])
            Lbound=Mid+1;
        else if(VAL<AR[Mid])
            Ubound=Mid-1;
        else
            return Mid;
    }

    return 0;
}

From what I can see, you're using a C array to store your entries, which means a big penalty in performance with huge number of entries whenever you try to insert an new entry because you may need to move a lot of entries in the array. 从我所看到的,你正在使用一个C数组来存储你的条目,这意味着当你尝试插入一个新条目时,由于你可能需要移动大量的条目,大量的条目会对性能造成很大的损失。阵列。

If you plan to keep a C array and not using some stl ordered containers (mostly thinking about std::map though), you may try to split your C array into two arrays. 如果你打算保留一个C数组而不使用一些stl有序容器(尽管主要考虑std :: map),你可以尝试将C数组拆分成两个数组。 One will be a first array containing your key and an index to an element of the second array. 一个是第一个包含键的数组和第二个数组元素的索引。 You still need to sort the first array but its element is only two words (one for key, one for index) instead of a big block including key and some values) and should be faster. 你仍然需要对第一个数组进行排序,但它的元素只有两个单词(一个用于键,一个用于索引)而不是包含键和一些值的大块)并且应该更快。 When inserting an item, you allocate at the end of the second array and take the index to insert it as a pair with key inside the first array. 插入项目时,在第二个数组的末尾分配并获取索引,将其作为一对与第一个数组内的键插入。 If you plan to remove an element dynamically, you can be a little smarter but your question appears not to cover it. 如果您打算动态删除元素,那么您可以更聪明一些,但您的问题似乎并未涵盖它。

But even so, it might be still too slow, so you should indeed consider std::map or some algorithms like binary tree using AVL, Red Black tree, Splay tree, etc. where you do not need to move element physically. 但即便如此,它可能仍然太慢,所以你应该考虑使用AVL,红黑树,Splay树等std :: map或二进制树等算法,你不需要在物理上移动元素。

If you're sorting a sorted list with only a few new out of place trailing items then you should take advantage of the rare case in which insertion sort actually works efficiently. 如果您正在对排序列表进行排序,只有少数新的不合适的尾随项目,那么您应该利用插入排序实际有效工作的罕见情况。 Implementing insertion sort on a sorted list with only a few trailing out of place values can sort in O(n) time. 在排序列表上实现插入排序只有少量尾随值可以在O(n)时间内排序。 You're just inserting your few out of place values into place, while quick sort is picking a pivot and going through the entire quick sort process. 您只是将少数不合适的值插入到位,而快速排序则选择一个支点并完成整个快速排序过程。 Also, if you're not incorporating some type of efficient pivot selection process into your quick sort, and going with some "average of first 3 items" approach on an already sorted list you're going to be sorting in O(n^2) time. 此外,如果您没有将某种类型的有效枢轴选择过程合并到您的快速排序中,并且在已经排序的列表中使用一些“前3项的平均值”方法,那么您将在O(n ^ 2)中进行排序。 ) 时间。

int add(Container c, int r, int l, Unit t)
{
    if(c[r]>t)
        return r;
    if(c[l]<t)
        return l+1;
    if(c[r]==c[l])
    {
         if(c[r]==t)
            return -1;
         return -1;
    }
    int m=(r+l)/2;
    if(c[m]==t)
          return -1;
    if(c[m]>t)
          return add(c,m,l,t);
    if(c[m]<t)
          return add(c,r,m,t);
}

It will probably give you the index you need to add...I hope it can help.It assumes you do not need to add when its already in. 它可能会给你你需要添加的索引...我希望它可以帮助。它假设你不需要添加它已经在。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM