How can I efficiently copy objects (or a range of objects) from vector A into vector B,
where vector B already contains certain objects identical to those from vector A,
so that no objects copied from vector A are already listed in vector B?
I have a graph stored as a vector of edges in std::vector<MinTreeEdge>minTreeInput
.
I have a minimum spanning tree created from this graph, stored in std::vector<MinTreeEdge>minTreeOutput
.
I'm trying to add a randomly add a certain number of edges back into minTreeOutput
. To do this, I want to copy elements from minTreeInput
back into minTreeOutput
until the latter contains the required number of edges. Of course, each edge object that is copied over must not already be stored minTreeOutput
. Can't have duplicate edges in this graph.
Below is what I've come up with so far. It works, but it's really long and I know the loop will have to be run many times depending on the graph and tree. I'd like to know how to do this properly:
// Edge class
struct MinTreeEdge
{
// For std::unique() between objects
bool operator==(MinTreeEdge const &rhs) const noexcept
{
return lhs == rhs.lhs;
}
int lhs;
int node1ID;
int node2ID;
int weight;
......
};
......
// The usage
int currentSize = minTreeOutput.size();
int targetSize = currentSize + numberOfEdgesToReturn;
int sizeDistance = targetSize - currentSize;
while(sizeDistance != 0)
{
//Probably really inefficient
for(std::vector<MinTreeEdge>::iterator it = minTreeInput.begin(); it != minTreeInput.begin()+sizeDistance; ++it)
minTreeOutput.push_back(*it);
std::vector<MinTreeEdge>::iterator mto_it;
mto_it = std::unique (minTreeOutput.begin(), minTreeOutput.end());
currentSize = minTreeOutput.size();
sizeDistance = targetSize - currentSize;
}
Alternatively, is there a way to just list all the edges in minTreeInput
(graph) that are not in minTreeOutput
(tree) without having to check each individual element in the former against the latter?
How can I efficiently copy objects (or a range of objects) from vector A into vector B, where vector B already contains certain objects identical to those from vector A, so that no objects copied from vector A are already listed in vector B?
If I understand the question correctly, this can be paraphrased to "how can I create a set union of two vectors?".
Answer: with std::set_union
Note that for this to work it requires that the two vectors are sorted. This is for efficiency reasons, as you have already touched upon.
#include <vector>
#include <algorithm>
#include <cassert>
#include <iterator>
struct MinTreeEdge
{
// For std::unique() between objects
bool operator==(MinTreeEdge const &rhs) const noexcept
{
return lhs == rhs.lhs;
}
int lhs;
int node1ID;
int node2ID;
int weight;
};
struct lower_lhs
{
bool operator()(const MinTreeEdge& l, const MinTreeEdge& r) const noexcept
{
return l.lhs < r.lhs;
}
};
std::vector<MinTreeEdge> merge(std::vector<MinTreeEdge> a,
std::vector<MinTreeEdge> b)
{
// let's pessimistically assume that the inputs are not sorted
// we could simply assert that they are if the caller is aware of
// the requirement
std::sort(a.begin(), a.end(), lower_lhs());
std::sort(b.begin(), b.end(), lower_lhs());
// alternatively...
// assert(std::is_sorted(a.begin(), a.end(), lower_lhs()));
// assert(std::is_sorted(b.begin(), b.end(), lower_lhs()));
// optional step if the inputs are not already `unique`
a.erase(std::unique(a.begin(), a.end()), a.end());
b.erase(std::unique(b.begin(), b.end()), b.end());
std::vector<MinTreeEdge> result;
result.reserve(a.size() + b.size());
std::set_union(a.begin(), a.end(),
b.begin(), b.end(),
std::back_inserter(result),
lower_lhs());
return result;
}
int main()
{
// example use case
auto a = std::vector<MinTreeEdge>{};
auto b = std::vector<MinTreeEdge>{};
b = merge(std::move(a), std::move(b));
}
There has been some mention of sets to accomplish this. And it is fair to say that if:
MinTreeEdge
is expensive to copy and, then we could expect to see a performance benefit in using an unordered_set
. However, if the objects are expensive to copy then we would probably want to store them in our temporary set by reference.
I might do it this way:
// utility class which converts unary and binary operations on
// a reference_wrapper into unary and binary operations on the
// referred-to objects
template<class unary, class binary>
struct reference_as_object
{
template<class U>
decltype(auto) operator()(const std::reference_wrapper<U>& l) const {
return _unary(l.get());
}
template<class U, class V>
decltype(auto) operator()(const std::reference_wrapper<U>& l,
const std::reference_wrapper<V>& r) const {
return _binary(l.get(), r.get());
}
unary _unary;
binary _binary;
};
// utility to help prevent typos when defining a set of references
template<class K, class H, class C> using unordered_reference_set =
std::unordered_set<
std::reference_wrapper<K>,
reference_as_object<H, C>,
reference_as_object<H, C>
>;
// define unary and binary operations for our set. This way we can
// avoid polluting MinTreeEdge with artificial relational operators
struct mte_hash
{
std::size_t operator()(const MinTreeEdge& mte) const
{
return std::hash<int>()(mte.lhs);
}
};
struct mte_equal
{
bool operator()(MinTreeEdge const& l, MinTreeEdge const& r) const
{
return l.lhs == r.lhs;
}
};
// merge function. arguments by value since we will be moving
// *expensive to copy* objects out of them, and the vectors themselves
// can be *moved* into our function very cheaply
std::vector<MinTreeEdge> merge2(std::vector<MinTreeEdge> a,
std::vector<MinTreeEdge> b)
{
using temp_map_type = unordered_reference_set<MinTreeEdge, mte_hash, mte_equal>;
// build a set of references to existing objects in b
temp_map_type tmap;
tmap.reserve(b.capacity());
// b first, since the requirements mentioned 'already in B'
for (auto& ob : b) { tmap.insert(ob); }
// now add missing references in a
for (auto& oa : a) { tmap.insert(oa); }
// now build the result, moving objects from a and b as required
std::vector<MinTreeEdge> result;
result.reserve(tmap.size());
for (auto r : tmap) {
result.push_back(std::move(r.get()));
}
return result;
// a and b now have elements which are valid but in an undefined state
// The elements which are defined are the duplicates we don't need
// on summary, they are of no use to us so we drop them.
}
Let's say that we wanted to stick with the vector method (we almost always should), but that MinTreeEdge was a little expensive to copy. Say it uses a pimpl idiom for internal polymorphism which will inevitably mean a memory allocation on copy. But let's say that it's cheaply moveable. Let's also imagine that the caller cannot be expected to sort or uniqueify data before sending it to us.
We can still achieve good efficiency with standard algorithms and vectors:
std::vector<MinTreeEdge> merge(std::vector<MinTreeEdge> a,
std::vector<MinTreeEdge> b)
{
// sorts a range if not already sorted
// @return a reference to the range
auto maybe_sort = [] (auto& c) -> decltype(auto)
{
auto begin = std::begin(c);
auto end = std::end(c);
if (not std::is_sorted(begin, end, lower_lhs()))
std::sort(begin, end, lower_lhs());
return c;
};
// uniqueify a range, returning the new 'end' of
// valid data
// @pre c is sorted
// @return result of std::unique(...)
auto unique = [](auto& c) -> decltype(auto)
{
auto begin = std::begin(c);
auto end = std::end(c);
return std::unique(begin, end);
};
// turn an iterator into a move-iterator
auto mm = [](auto iter) { return std::make_move_iterator(iter); };
std::vector<MinTreeEdge> result;
result.reserve(a.size() + b.size());
// create a set_union from two input containers.
// @post a and b shall be in a valid but undefined state
std::set_union(mm(a.begin()), mm(unique(maybe_sort(a))),
mm(b.begin()), mm(unique(maybe_sort(b))),
std::back_inserter(result),
lower_lhs());
return result;
}
If one provides a free function void swap(MinTreeEdge& l, MinTreeEdge& r) nothrow
then this function will require exactly N moves, where N is the size of the result set. Since in a pimpl class, a move is simply a pointer swap, this algorithm remains efficient.
Since your output vector should not contain duplicates, one way to accomplish not storing duplicates is to change the output container to a std::set<MinEdgeTree>
instead of std::vector<MinEdgeTree>
. The reason is that a std::set
does not store duplicates, thus you do not have to write the code to do this check yourself.
First, you need to define an operator <
for your MinEdgeTree
class:
struct MinTreeEdge
{
// For std::unique() between objects
bool operator==(MinTreeEdge const &rhs) const noexcept
{
return lhs == rhs.lhs;
}
// For std::unique() between objects
bool operator<(MinTreeEdge const &rhs) const noexcept
{
return lhs < rhs.lhs;
}
//...
};
Once you do that, the while
loop can be replaced with the following:
#include <set>
#include <vector>
#include <iterator>
#include <algorithm>
//...
std::vector<MinTreeEdge> minTreeInput;
//...
std::set<MinTreeEdge> minTreeOutput;
//...
std::copy(minTreeInput.begin(), minTreeInput.end(),
std::inserter(minTreeOutput, minTreeOutput.begin()));
There is no need to call std::unique
at all, since it is the std::set
that will check for the duplicates.
If the output container has to stay as a std::vector
, you can still do the above using a temporary std::set
and then copy the std::set
to the output vector:
std::vector<MinTreeEdge> minTreeInput;
std::vector<MinTreeEdge> minTreeOutput;
//...
std::set<MinTreeEdge> tempSet;
std::copy(minTreeInput.begin(), minTreeInput.end(),
std::inserter(tempSet, tempSet.begin()));
std::copy(tempSet.begin(), tempSet.end(),std::back_inserter(minTreeOutput));
You may use the following:
struct MinTreeEdge
{
bool operator<(MinTreeEdge const &rhs) const noexcept
{
return id < rhs.id;
}
int id;
int node1ID;
int node2ID;
int weight;
};
std::vector<MinTreeEdge> CreateRandomGraph(const std::vector<MinTreeEdge>& minSpanningTree,
const std::vector<MinTreeEdge>& wholeTree,
std::mt19937& rndEng,
std::size_t expectedSize)
{
assert(std::is_sorted(minSpanningTree.begin(), minSpanningTree.end()));
assert(std::is_sorted(wholeTree.begin(), wholeTree.end()));
assert(minSpanningTree.size() <= expectedSize);
assert(expectedSize <= wholeTree.size());
std::vector<MinTreeEdge> res;
std::set_difference(wholeTree.begin(), wholeTree.end(),
minSpanningTree.begin(), minSpanningTree.end(),
std::back_inserter(res));
std::shuffle(res.begin(), res.end(), rndEng);
res.resize(expectedSize - minSpanningTree.size());
res.insert(res.end(), minSpanningTree.begin(), minSpanningTree.end());
// std::sort(res.begin(), res.end());
return res;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.