如何返回可变数量的各种类型的容器？

Question

I have data that looks like this: 我有看起来像这样的数据：

     token            eps  rank # first line names columns
 Intercept   9.362637e+00     1 # later lines hold data
        A1  -2.395553e-01    30
        G1  -3.864725e-01    50
        T1   1.565497e-01    43
....

Different files will have different numbers of named columns and the types of values in each column will vary among floats, ints, and strings. 不同的文件将具有不同数量的命名列，并且每一列中值的类型将在浮点数，整数和字符串之间变化。

I want to write a readCols function to which i send names of columns (eg I may want the token and rank columns) which will put the the data in the specified column into containers of the appropriate type. 我想编写一个readCols函数，将列名发送给该函数（例如，我可能需要token和rank列），该函数会将指定列中的数据放入适当类型的容器中。

My problem is not in parsing the file but in returning a variable number of containers which contain different types. 我的问题不是解析文件，而是返回可变数量的包含不同类型的容器。 For instance, I want the token and rank columns put into vector<string> and vector<int> containers, respectively. 例如，我希望将token和rank列分别放入vector<string>和vector<int>容器中。 The issue here is that I may want the eps column instead (stored in a vector), and I don't want to write a different readCols function for every conceivable combination of types. 这里的问题是我可能希望改为使用eps列（存储在向量中），并且我不想为每种可能的类型组合编写不同的readCols函数。 (The type of container doesn't matter to me. If I have to only use vector s, no problem; that each container contains a different type is the key.) （容器的类型对我来说无关紧要。如果我只需要使用vector ，就没问题；每个容器包含不同的类型是关键。）

I'll probably need a container that holds different types to hold the different types of container. 我可能需要一个容纳不同类型容器的容器，以容纳不同类型的容器。 It looks like Boost.Variant might be the solution I want, but I don't know how to tell the parser which type I want each column to be (could I make something like a list of typenames? eg void readCols(string filename, vector<variant<various types of vector>> &data, vector<string> colNames, vector<typename> convertTo) ). 看起来Boost.Variant可能是我想要的解决方案，但我不知道如何告诉解析器我希望每一列为哪种类型（我可以像类型名列表那样进行操作吗？例如void readCols(string filename, vector<variant<various types of vector>> &data, vector<string> colNames, vector<typename> convertTo) ）。 Likewise, Boost.Mpl.Vector may solve the problem, but again I can't quite figure how to tell readCols how each column wants to be cast. 同样， Boost.Mpl.Vector可能解决了这个问题，但是我还是不太清楚如何告诉readCols如何转换每一列。

I can think of at least two workarounds: 我至少可以想到两种解决方法：

Read each column separately with a templated function that reads into any container ( container::value_type allows the function to know how to parse). 使用可读取任何容器的模板化函数分别读取每一列（ container::value_type允许该函数知道如何解析）。 I don't prefer this solution because the files are occasionally large (millions of lines) so parsing them multiple times would take an extra few minutes (not a negligible percentage of run-time in programs whose calculation takes ~30 minutes; the program will run over and over). 我不喜欢这种解决方案，因为文件偶尔会很大（几百万行），因此多次解析它们会花费额外的几分钟（在计算时间约为30分钟的程序中，运行时的百分比不会忽略不计；该程序会一遍又一遍）。
Read all columns into containers of strings and re-cast them in the calling context rather than in the parsing context. 将所有列读入字符串容器，然后在调用上下文中而不是在解析上下文中重新广播它们。 This wouldn't be so bad, as I think I can do the conversion in one line with std::transform and boost::lexical_cast or s/t. 这不会太糟，因为我认为我可以使用std::transform和boost::lexical_cast或s / t在一行中进行std::transform 。 If I can avoid 2n lines of bloat, great ( n =number of columns, typically 2 or 3, 2 lines per column to declare the container and then transform). 如果我可以避免2n行膨胀，那就太好了（ n =列数，通常为2或3，每列2行以声明容器，然后进行转换）。

It may be that the second workaround will require significantly less effort from me than a complete, generic solution; 与完整的通用解决方案相比，第二种解决方法可能需要我花费更少的精力； if that's the case, I'd like to know. 如果是这样，我想知道。 I imagine that the second workaround might even be more efficient, but I'm mainly concerned with ease of use at the moment. 我想第二种解决方法甚至可能更有效，但是我目前主要关注的是易用性。 If I can write one generic readCols function and be done with it, that's what I'd prefer. 如果我可以编写一个通用的readCols函数并完成它，那就是我的首选。

Answer 1

When things get too complicated, I break the problem into smaller parts. 当事情变得太复杂时，我将问题分解为较小的部分。 So here's a suggestion. 所以这是一个建议。

Write a CSV reader class which can read comma or other delimiter separated values from a file. 编写一个CSV阅读器类，该类可以从文件中读取逗号或其他定界符分隔的值。 The class reads a line at a time and breaks the line into std::string fields. 该类一次读取一行，并将该行分为std :: string字段。 In order to access the fields, you implement functions like getString, getInt, getDouble, etc that access the fields (by column name or index) and converts them to the appropriate type. 为了访问这些字段，您需要实现诸如getString，getInt，getDouble之类的函数，这些函数可以访问字段（按列名或索引）并将其转换为适当的类型。 So the reader does a well defined thing and deals with a limited number of primitive types. 因此，读者可以做一个定义明确的事情，并处理数量有限的原始类型。

Then implement reader functions (or classes) that utilize your CSV reader. 然后实施利用您的CSV阅读器的阅读器功能（或类）。 These reader function know the specific types of the columns and where to put their values - either in scalars, containers, etc. 这些读取器函数知道列的特定类型以及将其值放置在标量，容器等中的位置。

Answer 2

So long as the types of return values are limited, eg int , double , or std::string , a function like this will do the job: 只要返回值的类型受到限制，例如int ， double或std::string ，这样的函数就可以完成工作：

using namespace std;
void readCols(string fileName, vector<string> stringCols, 
      vector<string> intCols, vector<string> doubleCols, 
      vector<vector<string> > *stringData, 
      vector<vector<int> > *intData, 
      vector<vector<double> > *doubleData);

(Probably clear enough, but you list the column names you want according to what type they are.) （可能足够清晰，但是您可以根据列类型列出所需的列名。）

Whether this is more or less trouble than the workarounds is in the eye of the beholder. 旁观者认为这是比变通办法更多或更小的麻烦。

如何返回可变数量的各种类型的容器？

问题描述

2 个解决方案

解决方案1
1 2012-05-01 21:35:07

解决方案2
0 2012-05-02 16:09:58

如何返回可变数量的各种类型的容器？

问题描述

2 个解决方案

解决方案1 1 2012-05-01 21:35:07

解决方案2 0 2012-05-02 16:09:58

解决方案1
1 2012-05-01 21:35:07

解决方案2
0 2012-05-02 16:09:58