简体   繁体   English

在c / c ++中实现未知表模式的数据结构?

[英]Data structures to implement unknown table schema in c/c++?

Our task is to read information about table schema from a file, implement that table in c/c++ and then successfully run some "select" queries on it. 我们的任务是从文件中读取有关表架构的信息,在c / c ++中实现该表,然后在其上成功运行一些“选择”查询。 The table schema file may have contents like this, 表架构文件可能包含这样的内容,

    Tablename- Student
    "ID","int(11)","NO","PRIMARY","0","".

Now, my question is what data structures would be appropriate for the task. 现在,我的问题是什么数据结构适合该任务。 The problem is that I do not know the number of columns a table might have, neither as to what might the name of those columns be nor any idea about their data types. 问题是我不知道一个表可能有多少列,既不知道这些列的名称是什么,也不清楚它们的数据类型。 For example, a table might have just one column of type int, another might have 15 columns of varying data types. 例如,一个表可能只有一个int类型的列,另一表可能有15个不同数据类型的列。 Infact, I don't even know the number of tables whose description the schema file might have. 实际上,我什至不知道模式文件可能具有其描述的表的数量。

One way I thought of was to have a set number of say, 20 vectors (assuming that the upper limit of the columns in a table is 20), name those vectors 1stvector, 2ndvector and so on, map the name of the columns to the vectors, and then use them accordingly. 我想到的一种方法是有一定数量的向量,例如20个向量(假设表中列的上限为20),将这些向量命名为1stvector,2ndvector等,将列的名称映射到向量,然后相应地使用它们。 But it seems the code for it would be a mess with all those if/else statements or switch case statements (for the mapping). 但是,似乎所有这些if / else语句或switch case语句(用于映射)的代码都一团糟。

While googling/stack-overflowing, I learned that you can't describe a class at runtime otherwise the problem might have been easier to solve. 在谷歌搜索/堆栈溢出时,我了解到您无法在运行时描述类,否则该问题可能更容易解决。

Any help is appreciated. 任何帮助表示赞赏。 Thanks. 谢谢。

As a C++ data structure, you could try a std::vector< std::vector<boost::any> > . 作为C ++数据结构,您可以尝试std::vector< std::vector<boost::any> > A vector is part of the Standard Library and allows dynamic rescaling of the number of elements. 向量是标准库的一部分,并且可以动态调整元素数量。 A vector of vectors would imply an arbitrary number of rows with an arbitray number of columns. 向量的向量将暗示具有任意列数的任意数量的行。 Boost.Any is not part of the Standard Library but widely available and allows storing arbitrary types. Boost.Any不是标准库的一部分,但广泛可用,并允许存储任意类型。

I am not aware of any good C++ library to do SQL queries on that data structure. 我不知道有任何好的C ++库可以对该数据结构进行SQL查询。 You might need to write your own. 您可能需要自己编写。 Eg the SQL commands select and where would correspond to the STL algorithm std::find_if with an appropriate predicate passed as a function object. 例如,SQL命令selectwhere对应于STL算法std::find_if带有作为函数对象传递的适当谓词。

To deal with the lack of knowledge about the data column types you almost have to store the raw input (ie strings which suggests std:string ) and coerce the interpretation as needed later on. 为了处理对数据列类型的了解,您几乎必须存储原始输入(即,建议使用std:string ),并在以后根据需要强制执行解释。

This also has the advantage that the column names can be stored in the same type. 这还具有可以以相同类型存储列名的优点。

If you realy want to determine the column type you'll need to speculatively parse each column of input to see what it could be and make decisions on that basis. 如果您确实想确定列类型,则需要推测性地解析输入的每一列,以查看可能的结果并在此基础上做出决策。

Either way if the input could contain a column that has the column separation symbol in it (say a string including a space in otherwise white space separated data) you will have to know the quoting convention of the input and write a parses of some kind to work on the data (sucking whole lines in with getline is your friend here). 无论哪种方式,如果输入中可能包含带有列分隔符号的列(例如,字符串,否则用空格分隔的数据中包含空格),则您必须知道输入的引用约定并编写某种解析处理数据(在这里,用getline吸收整行是您的朋友)。 Your input appears to be comma separated with double quote deliminated strings. 您的输入似乎用双引号分隔的字符串用逗号分隔。

I suggest using std::vector to hold all the table creation statements. 我建议使用std::vector来保存所有表创建语句。 After all the creation statements are read in, you can construct your table. 读完所有创建语句后,即可构造表。

The problem to overcome is the plethora of column types. 要克服的问题是过多的列类型。 All the C++ containers like to have a uniform type, such as std::vector<std::string> . 所有C ++容器都喜欢具有统一的类型,例如std::vector<std::string> You will have different column types. 您将具有不同的列类型。

One solution is to have your data types descend from a single base. 一种解决方案是让您的数据类型来自一个基数。 That would allow you to have std::vector<Base *> for each row of the table, where the pointers can point to fields of different {child} types. 这样一来,您就可以在表的每一行使用std::vector<Base *> ,其中的指针可以指向不同{child}类型的字段。

I'll leave the rest up to the OP to figure out. 我把剩下的交给OP去解决。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM