[英]How can I return a variable number of containers of various types?
I have data that looks like this: 我有看起来像这样的数据:
token eps rank # first line names columns
Intercept 9.362637e+00 1 # later lines hold data
A1 -2.395553e-01 30
G1 -3.864725e-01 50
T1 1.565497e-01 43
....
Different files will have different numbers of named columns and the types of values in each column will vary among floats, ints, and strings. 不同的文件将具有不同数量的命名列,并且每一列中值的类型将在浮点数,整数和字符串之间变化。
I want to write a readCols
function to which i send names of columns (eg I may want the token
and rank
columns) which will put the the data in the specified column into containers of the appropriate type. 我想编写一个readCols
函数,将列名发送给该函数(例如,我可能需要token
和rank
列),该函数会将指定列中的数据放入适当类型的容器中。
My problem is not in parsing the file but in returning a variable number of containers which contain different types. 我的问题不是解析文件,而是返回可变数量的包含不同类型的容器。 For instance, I want the token
and rank
columns put into vector<string>
and vector<int>
containers, respectively. 例如,我希望将token
和rank
列分别放入vector<string>
和vector<int>
容器中。 The issue here is that I may want the eps
column instead (stored in a vector), and I don't want to write a different readCols
function for every conceivable combination of types. 这里的问题是我可能希望改为使用eps
列(存储在向量中),并且我不想为每种可能的类型组合编写不同的readCols
函数。 (The type of container doesn't matter to me. If I have to only use vector
s, no problem; that each container contains a different type is the key.) (容器的类型对我来说无关紧要。如果我只需要使用vector
,就没问题;每个容器包含不同的类型是关键。)
I'll probably need a container that holds different types to hold the different types of container. 我可能需要一个容纳不同类型容器的容器,以容纳不同类型的容器。 It looks like Boost.Variant might be the solution I want, but I don't know how to tell the parser which type I want each column to be (could I make something like a list of typenames? eg void readCols(string filename, vector<variant<various types of vector>> &data, vector<string> colNames, vector<typename> convertTo)
). 看起来Boost.Variant可能是我想要的解决方案,但我不知道如何告诉解析器我希望每一列为哪种类型(我可以像类型名列表那样进行操作吗?例如void readCols(string filename, vector<variant<various types of vector>> &data, vector<string> colNames, vector<typename> convertTo)
)。 Likewise, Boost.Mpl.Vector may solve the problem, but again I can't quite figure how to tell readCols
how each column wants to be cast. 同样, Boost.Mpl.Vector可能解决了这个问题,但是我还是不太清楚如何告诉readCols
如何转换每一列。
I can think of at least two workarounds: 我至少可以想到两种解决方法:
container::value_type
allows the function to know how to parse). 使用可读取任何容器的模板化函数分别读取每一列( container::value_type
允许该函数知道如何解析)。 I don't prefer this solution because the files are occasionally large (millions of lines) so parsing them multiple times would take an extra few minutes (not a negligible percentage of run-time in programs whose calculation takes ~30 minutes; the program will run over and over). 我不喜欢这种解决方案,因为文件偶尔会很大(几百万行),因此多次解析它们会花费额外的几分钟(在计算时间约为30分钟的程序中,运行时的百分比不会忽略不计;该程序会一遍又一遍)。 std::transform
and boost::lexical_cast
or s/t. 这不会太糟,因为我认为我可以使用std::transform
和boost::lexical_cast
或s / t在一行中进行std::transform
。 If I can avoid 2n
lines of bloat, great ( n
=number of columns, typically 2 or 3, 2 lines per column to declare the container and then transform). 如果我可以避免2n
行膨胀,那就太好了( n
=列数,通常为2或3,每列2行以声明容器,然后进行转换)。 It may be that the second workaround will require significantly less effort from me than a complete, generic solution; 与完整的通用解决方案相比,第二种解决方法可能需要我花费更少的精力; if that's the case, I'd like to know. 如果是这样,我想知道。 I imagine that the second workaround might even be more efficient, but I'm mainly concerned with ease of use at the moment. 我想第二种解决方法甚至可能更有效,但是我目前主要关注的是易用性。 If I can write one generic readCols
function and be done with it, that's what I'd prefer. 如果我可以编写一个通用的readCols
函数并完成它,那就是我的首选。
When things get too complicated, I break the problem into smaller parts. 当事情变得太复杂时,我将问题分解为较小的部分。 So here's a suggestion. 所以这是一个建议。
Write a CSV reader class which can read comma or other delimiter separated values from a file. 编写一个CSV阅读器类,该类可以从文件中读取逗号或其他定界符分隔的值。 The class reads a line at a time and breaks the line into std::string fields. 该类一次读取一行,并将该行分为std :: string字段。 In order to access the fields, you implement functions like getString, getInt, getDouble, etc that access the fields (by column name or index) and converts them to the appropriate type. 为了访问这些字段,您需要实现诸如getString,getInt,getDouble之类的函数,这些函数可以访问字段(按列名或索引)并将其转换为适当的类型。 So the reader does a well defined thing and deals with a limited number of primitive types. 因此,读者可以做一个定义明确的事情,并处理数量有限的原始类型。
Then implement reader functions (or classes) that utilize your CSV reader. 然后实施利用您的CSV阅读器的阅读器功能(或类)。 These reader function know the specific types of the columns and where to put their values - either in scalars, containers, etc. 这些读取器函数知道列的特定类型以及将其值放置在标量,容器等中的位置。
So long as the types of return values are limited, eg int
, double
, or std::string
, a function like this will do the job: 只要返回值的类型受到限制,例如int
, double
或std::string
,这样的函数就可以完成工作:
using namespace std;
void readCols(string fileName, vector<string> stringCols,
vector<string> intCols, vector<string> doubleCols,
vector<vector<string> > *stringData,
vector<vector<int> > *intData,
vector<vector<double> > *doubleData);
(Probably clear enough, but you list the column names you want according to what type they are.) (可能足够清晰,但是您可以根据列类型列出所需的列名。)
Whether this is more or less trouble than the workarounds is in the eye of the beholder. 旁观者认为这是比变通办法更多或更小的麻烦。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.