I want to implement a data table where the fields may have different types. One field may be a vector of string. Another field may be a vector of float. And the types of the fields are unknown at compile time because I want to be able to construct a data table from a csv file.
How can I do it in C++?
Use boost::variant
, which can represent one of a set of types:
std::vector<boost::variant<std::string, float>> values;
You can then apply a visitor to the variant:
struct visitor_t : boost::static_visitor<> {
void operator()(std::string const& x) const {
std::cout << "got string: " << x << '\n';
}
void operator()(float x) const {
std::cout << "got float: " << x << '\n';
}
};
visitor_t visitor;
for (auto&& value : values) {
boost::apply_visitor(visitor, value);
}
I have tried something similar:
class Component;
class Field : public Component
{
// Common interface methods
public:
virtual std::string get_field_name() const = 0;
virtual std::string get_value_as_string() const = 0;
};
class Record : public Component
{
// Common interface methods
std::vector< std::unique_ptr<Component> > fields;
};
class Integer_Field : public Field;
The idea is that a Record
can contain various fields. The various fields is implemented by a pointer to the Component
base class. This allows for a Record to contain sub-records.
You should see Sean Parent's talk on " Inheritance Is the Base Class of Evil ." You can see it in print format here , under "Value Semantics and Concept-based Polymorphism."
He proposed a concept-based object class that defines an interface for elements of a container. Any object that meets the required interface (ie has the required free-standing functions) can be put in the container.
You might be able to pick up the gist from looking at the code sample below (taken from the documentation I linked above).
class object_t {
public:
template <typename T>
object_t(T x) : self_(make_shared<model<T>>(move(x)))
{ }
friend void draw(const object_t& x, ostream& out, size_t position)
{ x.self_->draw_(out, position); }
private:
struct concept_t {
virtual ~concept_t() = default;
virtual void draw_(ostream&, size_t) const = 0;
};
template <typename T>
struct model : concept_t {
model(T x) : data_(move(x)) { }
void draw_(ostream& out, size_t position) const
{ draw(data_, out, position); }
T data_;
};
shared_ptr<const concept_t> self_;
};
Your fields would each be one of these object_t
types, which would take any type ( std::vector<int>
, std::deque<float>
, std::string
, etc.). You would just need to be sure that whatever methods you want to be supported for object_t
(in the example, it's just draw()
) are defined somewhere for your different inputs. This is nice because it gives you value semantics and also makes it very simple to add new types.
Because the data types are not known at compile time, you must construct and store that information at runtime. For each field of each row, there are potentially three pieces of information to encode:
You could use polymorphic types, boost::any
, or boost::variant
(or std::any
or std::variant
, as defined in C++17), but a more elegant, robust and memory-efficient solution would take advantage of the fact that every row has the same structure.
What you are doing is basically creating a database program. In a database, a schema encodes the structure of the data, but is separate from the data itself. What you want is a way to encode a schema at runtime, something like this:
enum class FieldType {
// Scalar types:
Boolean, Integer, FloatingPoint, String,
// Array types:
ArrayBit = 0x1000, // This bit set for array types
Boolean_Array = Boolean | ArrayBit,
Integer_Array, FloatingPoint_Array, String_Array
};
class FieldSchema {
FieldType m_type;
std::string m_name; // Optional, if fields are named
...
};
class RowSchema {
std::vector<FieldSchema> m_fields;
...
};
A data field itself is simply a union of the possible data types. (Note that putting a string or vector into a union requires C++11 or later.)
union FieldValue {
bool m_boolean;
int m_integer;
double m_floatingpoint;
std::string m_string;
std::vector<bool> m_boolean_array;
std::vector<int> m_integer_array;
std::vector<double> m_floatingpoint_array;
std::vector<std::string> m_string_array;
// Constructors for each type go here
};
And a data row is simply a vector of data fields, with a pointer to the schema:
class RowValue {
RowSchema* m_schama;
std::vector<FieldValue> m_fields;
...
};
Now, for each CSV file, there will be one RowSchema
object for the entire table, but one RowValue
object for each row. All of the RowValue
objects for a given file will share (point to) the same RowSchema
object. The process for reading a CSV file is:
RowSchema
object reflecting that structure. RowValue
object pointing to the RowSchema
from step 2; read each field into the correct data type as specified in the corresponding FieldSchema
; and append the value to the end of the m_fields
array using emplace_back
. Since this is a Stack Overflow answer and not a textbook about C++11, I won't go into detail on how to construct a union that contains a string or a vector, nor will I get into how to use vector::emplace_back
. All of this information is available in other places (eg, cppreference.com ). This can also be done in C++03, with additional work to simulate a union of non-trivial types (eg, by using boost::variant
).
Obviously, I've left out a lot of details. One caution I'll mention is that the destructor for FieldValue
is insufficient to destroy a string or vector contained within the union. Instead, you'll have to look up the data type in the schema and explicitly call the correct destructor for the field. The destructor for RowValue
must therefore iterate over the fields and destroy each individually. A C++17 std::variant
(or boost::variant
) would help here, at the cost of additional memory.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.