简体   繁体   中英

C++: Time complexity of using STL's sort in order to sort a 2d array of integers on different columns

let's say we have the following 2d array of integers:

1 3 3 1
1 0 2 2
2 0 3 1
1 1 1 0
2 1 1 3

I was trying to create an implementation where the user could give as input the array itself and a string. An example of a string in the above example would be 03 which would mean that the user wants to sort the array based on the first and the fourth column.

So in this case the result of the sorting would be the following:

1 1 1 0
1 3 3 1
1 0 2 2
2 0 3 1
2 1 1 3

I didn't know a lot about the compare functions that are being used inside the STL's sort function, however after searching I created the following simple implementation:

I created a class called Comparator.h

   class Comparator{

     private:
      std::string attr;

     public:
      Comparator(std::string attr) { this->attr = attr; }

      bool operator()(const int* first, const int* second){
       std::vector<int> left;
       std::vector<int> right;
       size_t i;
       for(i=0;i<attr.size();i++){
                left.push_back(first[attr.at(i) - '0']);
                right.push_back(second[attr.at(i) - '0']);
        }
        for(i=0;i<left.size();i++){
                if(left[i] < right[i]) return true;
                else if(left[i] > right[i]) return false;
        }
        return false;
      }

     };

I need to know the information inside the string so I need to have a class where this string is a private variable. Inside the operator I would have two parameters first and second , each of which will refer to a row. Now having this information I create a left and a right vector where in the left vector I have only the numbers of the first row that are important to the sorting and are specified by the string variable and in the right vector I have only the numbers of the second row that are important to the sorting and are specified by the string variable.

Then I do the needed comparisons and return true or false. The user can use this class by calling this function inside the Sorting.cpp class:

void Sorting::applySort(int **data, std::string attr, int amountOfRows){

  std::sort(data, data+amountOfRows, Comparator(attr));

 }

Here is an example use:

int main(void){
    //create a data[][] variable and fill it with integers
    Sorting sort;

sort.applySort(data, "03", number_of_rows);
}

I have two questions:

First question

Can my implementation get better? I use extra variables like the left and right vectors, and then I have some for loops which brings some extra costing to the sorting operation.

Second question

Due to the extra cost, how much worse does the time complexity of the sorting become? I know that STL's sort is O(n*logn) where n is the number of integers that you want to sort. Here n has a different meaning, n is the number of rows and each row can have up to m integers which in turn can be found inside the Comparator class by overriding the operator function and using extra variables(the vectors) and for loops.

Because I'm not sure how exactly is STL's sort implemented I can only make some estimates. My initial estimate would be O(n*m*log(n)) where m is the number of columns that are important to the sorting however I'm not 100% certain about it.

Thank you in advance

You can certainly improve your comparator. There's no need to copy the columns and then compare them. Instead of the two push_back calls, just compare the values and either return true, return false, or continue the loop according to whether they're less, greater, or equal.

The relevant part of the complexity of sort is O(n * log n) comparisons (in C++11. C++03 doesn't give quite such a good guarantee), where n is the number of elements being sorted. So provided your comparator is O(m) , your estimate is OK to sort the n rows. Since attr.size() <= m , you're right.

First question: you don't need left and rigth - you add elements one by one and then iterate over the vectors in the same order. So instead of pushing values to vectors and then iterating over them, simply use the values as you generate them in the first cycle like so:

    for(i=0;i<attr.size();i++){
            int left = first[attr.at(i) - '0'];
            int right = second[attr.at(i) - '0'];
            if(left < right) return true;
            else if(left > right) return false;
    }

Second question: can the time complexity be improved? Not with sorting algorithm that uses direct comparison. On the other had the problem you solve here is somewhat similar to radix sort . And so I believe you should be able to do the sorting in O(n*m) where m is the number of sorting criteria.

1) Firstly to start off you should convert the string into an integer array in the constructor. With validation of values being less than the number of columns.

(You could also have another constructor that takes an integer array as a parameter. A slight enhancement is to allow negative values to indicate that the order of the sort is reversed for that column. In this case the values would be -N..-1 , 1..N)

2) There is no need for the intermediate left, right arrays.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM