简体   繁体   中英

C++ splitting two dimensional vector into groups

I'm trying to achieve a group by function (from sql), using a two dimensional vector of strings, which represents the data source.

I'm allowing the user to select which field to group by. I don't know the best way to achieve this.

I don't want to group if the selected field doesn't contain enough consistency. Example:

ID  | name  | type
1   | Sam   | a
2   | Alex  | b
3   | Tom   | b
4   | Ryan  | a

With the above example, grouping by name shouldn't pass because there is too much variability in the data. Whereas type is a valid condition. How could I implement this type of checking? I was thinking of keeping track of how many instances of each group field there is?

Would it be unnecessary to store each group in its individual vectors?

Lets answer your first question. How do you determine if an attribute is valid to group on.

You want a low variability. You need a metric that tells you if you should be able to group by that attribute.

A very simple metric would be to find the number of unique elements in an attribute and divide by the total number of elements in that attribute. (1 means all elements are different 1/(number of elements) means all elements are the same)

So you can set a threshold on weather or not you group on an attribute by that number.

In your example: name has 4 unique elements out of 4 elements. it's score would be 1 type has 2 unique elements out of 4 elements. it's score would be 0.5

Note this metric may perform poorly on small data-sets.

No it's not necessary to store each attribute in it's own vector (but it will work). Other solutions: create a struct/class to hold your data and store that class in a vector.

vector[0] => {id: 1, name: Sam, type: a}
vector[1] => {id: 2, name: Alex, type: b}
vector[2] => {id: 3, name: Tom, type: b}
vector[3] => {id: 4, name: Ryan, type: a}

you could then group by sorting based on a specific key (ie based on type)

or

Create a separate hash or map for each group. each hash/map will store pointers to your objects.

type_hash [0] => List of pointers to data objects with type a
type_hash [1] => List of pointers to data objects with type b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM