简体   繁体   中英

How can I return a variable number of containers of various types?

I have data that looks like this:

     token            eps  rank # first line names columns
 Intercept   9.362637e+00     1 # later lines hold data
        A1  -2.395553e-01    30
        G1  -3.864725e-01    50
        T1   1.565497e-01    43
....

Different files will have different numbers of named columns and the types of values in each column will vary among floats, ints, and strings.

I want to write a readCols function to which i send names of columns (eg I may want the token and rank columns) which will put the the data in the specified column into containers of the appropriate type.

My problem is not in parsing the file but in returning a variable number of containers which contain different types. For instance, I want the token and rank columns put into vector<string> and vector<int> containers, respectively. The issue here is that I may want the eps column instead (stored in a vector), and I don't want to write a different readCols function for every conceivable combination of types. (The type of container doesn't matter to me. If I have to only use vector s, no problem; that each container contains a different type is the key.)

I'll probably need a container that holds different types to hold the different types of container. It looks like Boost.Variant might be the solution I want, but I don't know how to tell the parser which type I want each column to be (could I make something like a list of typenames? eg void readCols(string filename, vector<variant<various types of vector>> &data, vector<string> colNames, vector<typename> convertTo) ). Likewise, Boost.Mpl.Vector may solve the problem, but again I can't quite figure how to tell readCols how each column wants to be cast.

I can think of at least two workarounds:

  1. Read each column separately with a templated function that reads into any container ( container::value_type allows the function to know how to parse). I don't prefer this solution because the files are occasionally large (millions of lines) so parsing them multiple times would take an extra few minutes (not a negligible percentage of run-time in programs whose calculation takes ~30 minutes; the program will run over and over).
  2. Read all columns into containers of strings and re-cast them in the calling context rather than in the parsing context. This wouldn't be so bad, as I think I can do the conversion in one line with std::transform and boost::lexical_cast or s/t. If I can avoid 2n lines of bloat, great ( n =number of columns, typically 2 or 3, 2 lines per column to declare the container and then transform).

It may be that the second workaround will require significantly less effort from me than a complete, generic solution; if that's the case, I'd like to know. I imagine that the second workaround might even be more efficient, but I'm mainly concerned with ease of use at the moment. If I can write one generic readCols function and be done with it, that's what I'd prefer.

When things get too complicated, I break the problem into smaller parts. So here's a suggestion.

Write a CSV reader class which can read comma or other delimiter separated values from a file. The class reads a line at a time and breaks the line into std::string fields. In order to access the fields, you implement functions like getString, getInt, getDouble, etc that access the fields (by column name or index) and converts them to the appropriate type. So the reader does a well defined thing and deals with a limited number of primitive types.

Then implement reader functions (or classes) that utilize your CSV reader. These reader function know the specific types of the columns and where to put their values - either in scalars, containers, etc.

So long as the types of return values are limited, eg int , double , or std::string , a function like this will do the job:

using namespace std;
void readCols(string fileName, vector<string> stringCols, 
      vector<string> intCols, vector<string> doubleCols, 
      vector<vector<string> > *stringData, 
      vector<vector<int> > *intData, 
      vector<vector<double> > *doubleData);

(Probably clear enough, but you list the column names you want according to what type they are.)

Whether this is more or less trouble than the workarounds is in the eye of the beholder.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM