简体   繁体   中英

how to use std::sort to sort strings in a particular order

I have this situation in one of my projects where I need to sort the strings based on the order of the characters given.

So Ideally this is my requirement.

Lets say my initial alphabets are b,B,A,a,d,c,C,D,T,t in order.

and I have 3 strings "Bat" , "bat" , "atb" . After sorting the array should be "bat" , "Bat" , "atb" as b < B < a based on the above given order.

So I am thinking of using std::sort of c++.

But I am not sure of this whole idea. If the idea is fine, What data structure can be used to store the initial order of alphabets and how to write the compare function for the sort.

sort(arr, arr + 3, compare);

bool compare(string a, string b)
{
    /*how to proceed here?
}

Are there any other methods which can be more efficient than the using std::sort ?

Any ideas would help.

A custom comparator and std::sort should be sufficient for what you're trying to do. The important part of the comparator is to ensure it follows a strict weak ordering. One of the properties of that ordering is this:

Given a and b , if (!(a < b || b < a)) is true, then a and b must be equivalent.

Constructing a custom comparator as a functor is straight-forward, and is a good place to store your alphabet. To avoid string scanning even with sorted logN efficiency a custom alphabet table holding the numeric sort-order equivalence can be used. This will be very fast, shining especially well for large string comparisons as each char lookup is constant time.

Example of a custom comparator follows:

#include <iostream>
#include <algorithm>
#include <string>
#include <climits>

struct CustomAlphaCmp
{
    int table[1 << CHAR_BIT];
    CustomAlphaCmp(const std::string& alpha)
    {
        std::fill(std::begin(table), std::end(table), INT_MAX);
        int value = 0;
        for (auto x : alpha)
            table[ static_cast<unsigned char>(x) ] = ++value;
    }

    bool operator()(const std::string& a, const std::string& b)
    {
        auto lhs = a.begin();
        auto rhs = b.begin();

        for (; lhs != a.end() && rhs != b.end(); ++lhs,++rhs)
        {
            int lhs_val = table[static_cast<unsigned char>(*lhs)];
            int rhs_val = table[static_cast<unsigned char>(*rhs)];

            if (lhs_val != rhs_val)
                return lhs_val < rhs_val;
        }

        return (rhs != b.end());
    }
};

int main()
{
    std::string alpha = "bBAadcCDTt";
    std::string ar[] = { "Bat", "bat", "X", "atb", "bBb", "bbb", "B",
                         "bat", "aaa", "Y", "Cat", "CaT", "Bat", "A" };

    std::sort(std::begin(ar), std::end(ar), CustomAlphaCmp(alpha));

    for (auto const& s : ar)
        std::cout << s << '\n';
}

Output

bbb
bBb
bat
bat
B
Bat
Bat
A
aaa
atb
CaT
Cat
Y
X

How it works

The comparator object is constructed from the custom alphabet to initialize a table indexes by all possible char values using their alphabet position as their "value" in the table. All non-alphabet chars hold a value of INT_MAX giving them the "weakest" possible order value, and treating them all as equivalent.

Once that is complete the comparator will be handed off to the sort algorithm. When two strings are compared they are walked until a non-matching value is encountered or one/both of the strings reach termination. At that point either the string finished at the same time, the left "finished" first, or the right finished first. We know all chars up to that point are equal. Therefore, if the left side finished before the right, only then is the left truly "less-than" the right. if they are the same OR the right side finished first (matters not which), the left side cannot be less. Therefore we can simply return whether the right side reach its end as the final answer.

This specific comparator ignores all non-alphabet characters, so any alphabet character will be less than any non-alphabet character, and all non-alphabet characters are treated as equals. If that isn't sufficient for your needs some tweaking may be needed.

Finally, the preparation time for the comparator is a fixed fill-cost plus O(n) for the length of the alphabet. If you're using the same alphabet for many sort operations preparing a comparator ahead of time and just sending it down to std::sort may be warranted. Again, may need some tweaking for your needs.

Regardless, Best of luck.

Here's a way to custom your sort comparison operator

struct comparisonFuction
{
    inline bool operator() (const YourCustomClass& obj1, const YourCustomClass& obj2)
    {
        return (obj1.customValue < obj2.customValue);
    }
};

std::vector < YourCustomClass > vec;

std::sort(vec.begin(), vec.end(), comparisonFuction());

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM