简体   繁体   中英

C++ STL Vector Sorting - corrupting & zeroing out

The programme that I am developing is aimed to process very large amounts of data and generate at least 2^34 boolean data. These data statically generated & cleared throughout the programme run (only a portion is sorted at each instance) and finally a vector of minimum 2^21 rows of statistical data is passed to the final stage for further processing.

However, the STL Sorting fails for some input data. After Sorting finishes its process, some of the vector rows will be zeroed out or corrupted. It seems the only option that I have is to try to hard-code a hybrid Quicksort/Insertion sort algorithm.

I appreciate if you project your thoughts. Cheers.


Data Structure of data for the final stage:

struct statisticalValues{
    unsigned long long id;      //index id
    unsigned int col_Sum;       //Sum: total number of 1s for each combination
    unsigned int col_Relevancy; //Relevancy = total number of 1s produced by (Comb AND Rel)
    float col_Sensitivity;      //Sensitivity= Relevancy / X
    float col_Precision;        //Precision= Relevancy / Sum
};
extern vector<statisticalValues> statistics;

Calling STL Sort:

sort(statistics.begin(), statistics.end(), BySensitivity());

The comparison criteria:

#define EPSILON 0.0001 // user-defined tolerance for equality of floating-point numbers
struct BySensitivity {
    bool operator()(statisticalValues const &a, statisticalValues const &b) const {
        float sensitivityDif = b.col_Sensitivity - a.col_Sensitivity;

        if((sensitivityDif < EPSILON) && (sensitivityDif > -EPSILON)){
            return ((b.col_Precision - a.col_Precision) < EPSILON);
        }else{
            return (sensitivityDif < -EPSILON);
        }
    }
};

The rows of the sample data that will be corrupted (in no particular order):

id,col_Sum,col_Relevancy,col_Sensitivity,col_Precision
1568676,5353,3696,94.166,69.045
1770228,5353,3696,94.166,69.045
2040533,5353,3696,94.166,69.045
2053376,5353,3696,94.166,69.045
1231712,4668,3425,87.261,73.372
1946656,4668,3425,87.261,73.372
1948021,4668,3425,87.261,73.372

After corrupting & zeroing out by STL Sorting:

id,col_Sensitivity,col_Precision
10540996614775448722,5.8399e-34,5.8399e-34
8589934369,0.0000,0.0000
0,0.0000,0.0000
0,0.0000,0.0000
0,0.0000,0.0000
0,0.0000,0.0000
0,0.0000,0.0000


After implementing suggested modifications:

The comparison criteria:

struct BySensitivity {
    bool operator()(statisticalValues const &a, statisticalValues const &b) const {
        float sensitivityDif = b.col_Sensitivity - a.col_Sensitivity;

        if((sensitivityDif <= EPSILON) && (sensitivityDif >= -EPSILON)){
            return ((b.col_Precision - a.col_Precision) < -EPSILON);
        }else{
            return (sensitivityDif < -EPSILON);
        }
    }
};

Thnaks to @Mark-B, @btilly, @David-Thornley, @sth & @Daniel-Gallagher

The STL sort can corrupt data if the comparison operator can produce inconsistent results, such as x < y < z < x.

Your comparison operator can produce inconsistent results.

Your comparator doesn't implement strict weak ordering. For example two items A and B with equal col_Sensitivity and col_Precision , both A < B and B < A are true. As you can imagine, trying to sort with a sort function that doesn't actually provide an ordering can produce undefined behavior.

Thanks to (and quoting) @David Thornley for the standard reference:

Standard, part of 25.3/3: "For the algorithms to work correctly, comp has to induce a strict weak ordering on the values." This means that not having a strict weak ordering is undefined (the Standard says nothing).

I think in this case you just want to remove all the epsilon checks completely:

struct BySensitivity {
bool operator()(statisticalValues const &a, statisticalValues const &b) const {
    float sensitivityDif = b.col_Sensitivity - a.col_Sensitivity;

    if(sensitivityDif == 0.0)){
        return ((b.col_Precision - a.col_Precision) < 0.0);
    }else{
        return (sensitivityDif < 0.0);
    }
}};

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM