简体   繁体   English

C ++:如何比较多个向量,然后创建一个包含所有向量的所有元素的新排序向量

[英]C++: how to compare several vectors, then make a new sorted vector that contains ALL elements of all vectors

Update : I have a couple of what are probably silly questions about commenter 6502's answer (below). 更新 :我有一些关于评论者6502的答案的愚蠢问题(如下)。 If anyone could help, I'd really appreciate it. 如果有人可以提供帮助,我将非常感激。

1) I understand that data 1 and data 2 are the maps, but I don't understand what allkeys is for. 1)我知道数据1和数据2是映射,但是我不知道allkey的作用。 Can anyone explain? 谁能解释?

2) I know that: data1[vector1[i].name] = vector1[i].value; 2)我知道:data1 [vector1 [i] .name] = vector1 [i] .value; means assign a value to the map of interest where the correct label is... But I don't understand this: vector1[i].name and vector1[i].value . 意味着在感兴趣的地图上为正确的标签分配一个值...但是我不明白: vector1 [i] .namevector1 [i] .value Are't "name" and "value" two separate vectors of labels and values? “名称”和“值”不是标签和值的两个单独的向量吗? So what are they doing on vector1? 那么他们在vector1上做什么? Shouldn't this read, name[i] and value[i] instead? 这不应该读为name [i]value [i]吗?

Thanks everyone. 感谢大家。


I have written code for performing a calculation. 我已经编写了用于执行计算的代码。 The code uses data from elsewhere. 该代码使用其他地方的数据。 The calculation code is fine, but I'm having trouble manipulating the data. 计算代码很好,但是在处理数据时遇到了麻烦。

The data exist as sets of vectors. 数据以向量集的形式存在。 Each set has one vector of labels (names, these are strings) and a corresponding set of values (doubles or ints). 每个集合都有一个标签向量(名称,它们是字符串)和一组对应的值(双精度或整数)。

The problem is that I need each data set to have the same name/label in the same column as the other data sets. 问题是我需要每个数据集在同一列中具有与其他数据集相同的名称/标签。 This problem is not the same as sorting the data in the vectors (which I know how to do) because sometimes names/labels can be missing from some vectors. 此问题与对向量中的数据进行排序(我知道该怎么做)不同,因为有时某些向量中可能缺少名称/标签。

For example: 例如:

Data set 1: 数据集1:

vector names1 = Jim, Tom, Mary 矢量名称1 =吉姆,汤姆,玛丽

vector values1 = 1 2 3 向量值1 = 1 2 3

Data set 2: 数据集2:

vector names2 = Tom, Mary, Joan 矢量名称2 =汤姆,玛丽,琼

vector values2 = 2 3 4 向量值2 = 2 3 4

I want (pseudo-code) ONE name vector that has all possible names. 我想要(伪代码)一个具有所有可能名称的名称向量。 I also want each corresponding numbers vector to be sorted the SAME way: 我也希望每个对应的数字向量都以相同的方式排序:

vector namesUniversal = Jim, Joan, Mary, Tom 矢量名称通用=吉姆,琼,玛丽,汤姆

vector valuesUniversal1 = 1 0 3 2 向量值Universal1 = 1 0 3 2

vector valuesUniversal2 = 0 4 3 2 矢量值Universal2 = 0 4 3 2

What I want to do is come up with a universal vector that contains ALL the labels/names sorted alphabetically and all the corresponding numerical data sorted too. 我想做的是拿出一个通用向量,其中包含所有按字母顺序排序的标签/名称,以及所有相应的也排序的数值数据。

Can anyone tell me whether there is an elegant way to do this in c++? 谁能告诉我在c ++中是否有一种优雅的方法来做到这一点? I guess I could compare each element of each name vector with each element of each other name vector, but this seems quite clunky and I would not know how to get the data into the right columns in the corresponding data vectors. 我想我可以将每个名称向量的每个元素与每个其他名称向量的每个元素进行比较,但这似乎很笨拙,我不知道如何将数据放入相应数据向量的正确列中。 Thanks for any advice. 感谢您的任何建议。

The algorithm you are looking for is usually named "merging". 您要寻找的算法通常称为“合并”。 Basically you sort the two data sets and then look at data in pairs: if the keys are equal then you process and output the pair, otherwise you process and advance only the smallest one. 基本上,您对两个数据集进行排序,然后成对查看数据:如果键相等,则处理并输出该对,否则仅处理并前进最小的一对。

You must also handle the case where one of the two lists ends before the other (this can be avoided by using special flag values that are guaranteed to be higher than any value you need to process). 您还必须处理两个列表之一在另一个列表之前结束的情况(可以通过使用保证比您需要处理的任何值都高的特殊标志值来避免这种情况)。

The following is pseudocode for merging 以下是用于合并的伪代码

  1. Sort vector1 排序vector1
  2. Sort vector2 vector2排序
  3. Set index1 = index2 = 0; 设置index1 = index2 = 0;
  4. Loop until both index1 >= vector1.size() and index2 >= vector2.size() (in other words until both vectors are exhausted) 循环直到两个index1 >= vector1.size()index2 >= vector2.size() (换句话说,直到两个向量都用完)
  5. If index1 == vector1.size() (ie if vector1 has been processed) then output vector2[index2++] 如果index1 == vector1.size() (即,如果vector1已被处理),则输出vector2[index2++]
  6. Otherwise if index2 == vector2.size() (ie if vector2 has been processed) then output vector1[index1++] 否则,如果index2 == vector2.size() (即,如果vector2已被处理),则输出vector1[index1++]
  7. Otherwise if vector1[index1] == vector2[index2] output merged data and increment both index1 and index2 否则,如果vector1[index1] == vector2[index2]输出合并的数据并同时增加index1index2
  8. Otherwise if vector1[index1] < vector2[index2] output vector1[index1++] 否则,如果vector1[index1] < vector2[index2]输出vector1[index1++]
  9. Otherwise output vector2[index2++] 否则输出vector2[index2++]

However in C++ you can implement a much easier to write solution that is probably still fast enough (warning: untested code!): 但是,在C ++中,您可以实现一个更容易编写的解决方案,该解决方案可能仍然足够快(警告:未经测试的代码!):

std::map<std::string, int> data1, data2;
std::set<std::string> allkeys;

for (int i=0,n=vector1.size(); i<n; i++)
{
    allkeys.insert(vector1[i].name);
    data1[vector1[i].name] = vector1[i].value;
}

for (int i=0,n=vector2.size(); i<n; i++)
{
    allkeys.insert(vector2[i].name);
    data2[vector2[i].name] = vector2[i].value;
}

for (std::set<std::string>::iterator i=allkeys.begin(), e=allkeys.end();
     i!=e; ++i)
{
   const std::string& key = *i;
   std::cout << key << data1[key] << data2[key] << std::endl;
}

The idea is to just build two maps data1 and data2 from name to values, and at the same time collecting all keys that are appearing in a std::set of keys named allkeys (adding the same name to a set multiple times does nothing). 这个想法是只构建两个从名称到值的映射data1data2 ,并同时收集出现在名为allkeysstd::set键中的所有键(将相同的名称多次添加到set中没有任何作用) 。

After the collection phase this set can then be iterated to find all the names that have been observed and for each name the value can be retrieved from data1 and data2 maps ( std::map<std::string, int> will return 0 when looking for the value of a name that has not been added to the map). 在收集阶段之后,可以迭代该集合以查找已观察到的所有名称,并且对于每个名称,可以从data1data2映射中检索值( std::map<std::string, int>将返回0查找尚未添加到地图的名称的值)。

Technically this is sort of overkilling (uses three balanced trees to do the processing that would have required just two sort operations) but is less code and probably acceptable anyway. 从技术上讲,这是一种过度杀伤(使用三个平衡树来执行只需要两个排序操作的处理),但是代码较少,而且无论如何还是可以接受的。

6502's solution looks fine at first glance. 乍看之下,6502的解决方案看起来不错。 You should probably use std::merge for the merging part though. 您可能应该在合并部分使用std::merge

EDIt: 编辑:

I forgot to mention that there is now also a multiway_merge extension of the STL available in the GNU version of the STL. 我忘了提到,现在STL的GNU版本中还提供了STL的multiway_merge扩展。 It is a part of the parallel mode, so it resides in the namespace __gnu_parallel . 它是并行模式的一部分,因此它位于命名空间__gnu_parallel If you need to do multiway merging, it will be very hard to come up with something as fast or simple to use as this. 如果您需要进行多路合并,那么很难像这样快速或简单地使用。

A quick way which comes to mind is to use a map<pair<string, int>, int> and for each value store it in the map with the right key. 想到的一种快速方法是使用map<pair<string, int>, int>并为每个值使用右键将其存储在地图中。 (For example (Tom, 2) in the first values set will be under the key (Tom, 1) with value 2) Once the map is ready iterate over it and build whatever data structure you want (Assuming the map is not enough for you). (例如,第一个值集中的(Tom,2)将位于具有值2的键(Tom,1)下)。一旦映射准备好,就可以对其进行迭代并构建所需的任何数据结构(假定该映射不足以用于您)。

I think you need to alter how you store this data. 我认为您需要更改存储此数据的方式。 It looks like you're saying each number is logically associated with the name in the same position: Jim = 1, Mary = 3, etc. 您似乎在说每个数字在逻辑上都与相同位置的名称相关:Jim = 1,Mary = 3,依此类推。

If so, and you want to stick with a vector of some kind, you could redo your data structure like so: 如果是这样,并且您希望使用某种vector ,则可以像这样重做数据结构:

typedef std::pair<std::string, int> NameNumberPair;
typedef std::vector<NameNumberPair> NameNumberVector;

NameNumberVector v1;

You'll need to write your own operator< which returns based on the sort order of the underlying names. 您需要编写自己的operator< ,该operator<根据基础名称的排序顺序返回。 However, as Nawaz points out, a map would be a better way to represent the associated nature of the data. 但是,正如Nawaz指出的那样, map将是表示数据关联性质的一种更好的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM