简体   繁体   中英

Fastest way to Group a large number of Elements in C++

I need a way to quickly group a large number of connections, currently 300k, into groups where each group has a max number of elements allowed, currently 14k, and all of the connections in the same group cannot be connected to the same point. Basically, each connection is between two points and I need them grouped into buckets where the connections in a bucket don't share a point. Hopefully that makes sense.

Here's what I have so far, which works but is rather slow:

for (size_t i = 0; i < ConnectionGroups.size(); i++)
{
    auto& group = ConnectionGroups[i];
    if (group.size() < MaxConnectionGroupSize) // Has room for us...
    {
        int validGroupIdx = i;
        for (size_t gIdx = 0; gIdx < group.size(); gIdx++)
        {
            const auto groupConnection = ConnectionsQuickAccess[group[gIdx]];

            // Are we directly connected to one of the Connections in this group by one degree...
            if (Connection.Point1 == groupConnection->Point1 || Connection.Point1 == groupConnection->Point2 ||
                Connection.Point2 == groupConnection->Point1 || Connection.Point2 == groupConnection->Point2)
            {
                validGroupIdx = -1;
                break; // We are, check the next group
            }
        }

        if (validGroupIdx != -1)
        {
            ConnectionGroups[i].push_back(Connection.Slot);
            Connection.Group = i;
            return;
        }
        else
            continue;
    }
}

// All groups are full, create a new group
vector<int> newGroup;
newGroup.push_back(Connection.Slot);
ConnectionGroups.push_back(newGroup);

This code takes 29.68s to go through 300k connections, is there a faster way of doing this? Or maybe a different approach to this?

Thank you!

It seems the code posted processes one connection, ie, it is called n times where n is the number of connections. The algorithm is clearly O(n * n) : the time it takes to add new connections grows quadratic - something you generally don't want.

Unless memory is the primary constraints, I'd simple store for each group a hash containing all existing end-points and check against that, ie, something along the lines of

for (std:size_t i(0); i != ConnectionGroups.size(); ++i) {
    if (ConnectionGroups[i].size () < MaxConnectionGroupSize)
        && !ConnectionGroups[i].usesEndPoint(Connection.Point1)
        && !ConnectionGroups[i].usesEndPoint(Connection.Point2)) {
        ConnectionGroups[i].addEndPoint(Connection.Point1);
        ConnectionGroups[i].addEndPoint(Connection.Point2);
        ConnectionGroups[i].addConnection(Connection);
    }
}

Obviously, ConnectionGroups[i] would be a combination of connections and a hash of end-points accessed with the corresponding functions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM