从具有运行时索引的元组中选择一组值

Question

Short introduction to my questions: i'm trying to implement a "sort of" relational database using stl containers.我的问题的简短介绍：我正在尝试使用 stl 容器实现一种“某种”关系数据库。 This is just for fun/educational purpose, so no need for answers like "use this library", "this is absolutely useless" and so on.这只是为了娱乐/教育目的，所以不需要像“使用这个库”、“这绝对没用”之类的答案。 I know title is a little bit confusing at this point, but we will reach the point (suggestions for improvement to title are really welcome).我知道标题在这一点上有点令人困惑，但我们会达到目的（改进标题的建议真的很受欢迎）。

I proceeded with little steps:我继续做一些小步骤：

i can build table as vector of maps from columns name to their values => std::vector<std::map<std::string, some_variant>> .我可以将表构建为从列名到其值的映射向量 => std::vector<std::map<std::string, some_variant>> 。 It's simple and it represents what i need.它很简单，它代表了我需要的东西。
wait, i can just store column's names once and access values with their index.等等，我只能存储列的名称一次并使用它们的索引访问值。 => std::vector<std::vector<some_variant>> .As simple as point 1, but faster than that. => std::vector<std::vector<some_variant>> 。就像第 1 点一样简单，但比这更快。
wait wait, in a database a table is literrally a sequence of tuple => std::vector<std::tuple<args...>> .等等，在数据库中，表实际上是一个元组序列 => std::vector<std::tuple<args...>> 。 This is cool, it represents exactly what i'm doing, correct type without variant and even faster than the other.这很酷，它代表了我正在做的事情，正确的类型，没有变体，甚至比另一个更快。

Note: the "faster than" was measured for 1000000 records with a simple loop like this:注意：“快于”是用一个简单的循环测量 1000000 条记录的，如下所示：

std::random_device dev;
std::mt19937 gen(dev());
std::uniform_int_distribution<long> rand1_1000(1, 1000);
std::uniform_real_distribution<double> rand1_10(1.0, 10.0);

void fill_1()
{
    using my_variant = std::variant<long, long long, double, std::string>;
    using values = std::map<std::string, my_variant>;
    using table = std::vector<values>;

    table t;
    for (int i = 0; i < 1000000; ++i)
        t.push_back({ {"col_1", rand1_1000(gen)}, {"col_2", rand1_1000(gen)}, {"col_3", rand1_10(gen)} });
    std::cout << "size:" << t.size() << "\n";//just to prevent optimization
}

2234101600ns - avg:2234 2234101600ns - 平均：2234
446344100ns - avg:446 446344100ns - 平均：446
132075400ns - avg:132 132075400ns - 平均：132

INSERT: No problem with any of these solutions, insert are as simple as pushing back elements as in the example. INSERT：这些解决方案都没有问题，insert 就像示例中的推回元素一样简单。

SELECT: 1 and 2 are simple, but 3 is tricky. SELECT： 1 和 2 很简单，但 3 很棘手。

So, finally, questions:所以，最后，问题：

Memory usage : there is a lot of overhead using solution 1 and 2 in term of used memory.内存使用：就使用的内存而言，使用解决方案 1 和 2 有很多开销。 So, 3 seems to be again the right choice here.所以，3 在这里似乎又是正确的选择。 For the example with 1 million records of 2 long s and a double i was expecteing something near 4MB*2 for longs and 8MB for doubles plus some overhead for vectors, maps and variants where used.对于具有 100 万条 2 long s 和double精度记录的示例，我预计长数据接近 4MB*2， double精度数据接近 8MB，加上使用的向量、映射和变体的一些开销。 Instead we have (measured with windows task manager, not extremely accurate, i know):相反，我们有（用 Windows 任务管理器测量，不是非常准确，我知道）：
1.340 MB 1.340 MB
2.120 MB 2.120 MB
3.31 MB 3.31 MB
Am i missing something?我错过了什么吗？ Other than reserving the right size in advance or shrink_to_fit after the insert loop?除了在插入循环后提前保留正确的大小或shrink_to_fit ？
Is there a way to run-time retrieve some tuple field as in the case of a select statement?有没有办法像 select 语句一样在运行时检索一些元组字段？

using my_tuple = std::tuple<long, long, string,  double>;
std::vector<my_tuple> table;
int to_select;//this could be a vector of columns to select obviosly
std::cin>>to_select;
auto result = select (table, to_select);

Do you see any chance to implement this last line in any way?您是否有机会以任何方式实施最后一行？ We have two problem for what i see: the result type should take the the type from the starting tuple and then, actually perform the selection of desired fields.对于我所看到的，我们有两个问题：结果类型应该从起始元组中获取类型，然后实际执行所需字段的选择。

I read a lot of answers about that, they all talk about contiguous indexes using make_index_sequence or complile-time known index.我读了很多关于这个的答案，他们都谈论使用make_index_sequence或编译时已知索引的连续索引。 I also found this article , very interesting, but not really useful for this case.我还发现了这篇文章，非常有趣，但对这种情况并不是很有用。

Answer 1

This is doable but it is strange:这是可行的，但很奇怪：

template<size_t candidate, typename ...T>
constexpr std::variant<T...> helperTupleValueAt(const std::tuple<T...>& t, size_t index)
{
    if constexpr (candidate >= sizeof...(T)) {
        throw std::logic_error("out of bounds");
    } else {
        if (candidate == index) {
            return std::variant<T...>{ std::in_place_index<candidate>, std::get<candidate>(t) };
        } else {
            return helperTupleValueAt<candidate + 1>(t, index);
        }
    }
}

template<typename ...T>
std::variant<T...> tupleValueAt(const std::tuple<T...>& t, size_t index)
{
    return helperTupleValueAt<0>(t, index);
}

https://wandbox.org/permlink/FQJd4chAFVSg5eSy https://wandbox.org/permlink/FQJd4chAFVSg5eSy

从具有运行时索引的元组中选择一组值

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-07-04 10:50:02

从具有运行时索引的元组中选择一组值

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-07-04 10:50:02

解决方案1
2 已采纳 2019-07-04 10:50:02