简体   繁体   English

在主内存中存储关系的最佳方法是什么?

[英]What is the best way to store a relation in main memory?

I am working on an application which is a mini DBMS design for evaluating SPJ queries. 我正在开发一个应用程序,它是一个用于评估SPJ查询的迷你DBMS设计。 The program is being implemented in C++. 该程序正在用C ++实现。

When I have to process a query for joins and group-by, I need to maintain a set of records in the main memory. 当我必须处理联接和分组的查询时,我需要在主内存中维护一组记录。 Thus, I have to maintain temporary tables in main memory for executing the queries entered by the user. 因此,我必须在主存储器中维护临时表以执行用户输入的查询。

My question is, what is the best way to achieve this in C++? 我的问题是,在C ++中实现这一目标的最佳方法是什么? What data structure do I need to make use of in order to achieve this? 我需要使用哪种数据结构才能实现这一目标?

In my application, I am storing data in binary files and using the Catalog (which contains the schema for all the existing tables), I need to retrieve data and process them. 在我的应用程序中,我将数据存储在二进制文件中并使用Catalog(包含所有现有表的模式),我需要检索数据并处理它们。

I have only 2 datatypes in my application: int (4 Bytes) and char (1 Byte) 我的应用程序中只有2种数据类型:int(4字节)和char(1字节)

I can use std:: vector. 我可以使用std :: vector。 In fact, I tried to use vector of vectors: the inner vector is used for storing attributes, but the problem is there can be many relations existing in the database, and each of them may be any number of attributes. 事实上,我尝试使用向量向量:内部向量用于存储属性,但问题是数据库中可能存在许多关系,并且每个关系可以是任意数量的属性。 Also, each of these attributes can be either an int or a char. 此外,这些属性中的每一个都可以是int或char。 So, I am unable to identify what is the best way to achieve this. 所以,我无法确定实现这一目标的最佳方法。

Edit 编辑

I cannot use a struct for the tables because I do not know how many columns exist in the newly added tables, since all tables are created at runtime as per the user query. 我不能对表使用结构,因为我不知道新添加的表中有多少列,因为所有表都是在运行时根据用户查询创建的。 So, a table schema cannot be stored in a struct. 因此,表模式不能存储在结构中。

A Relation is a Set of Tuples (and in SQL, a Table is a Bag of Rows). Relation是一组元组(在SQL中,表是一行行)。 Both in Relational Theory and in SQL, all tuples (/rows) in a relation (/table) "comply to the heading". 在关系理论和SQL中,关系(/ table)中的所有元组(/行)都符合标题“。

So it is interesting to make an object to store relations (/tables) consist of two components: an object of type "Heading" and a Set (/Bag) object containing the actual tuples (/rows). 因此有趣的是使对象存储关系(/ tables)由两个组件组成:“Heading”类型的对象和包含实际元组(/ rows)的Set(/ Bag)对象。

The "Heading" object is itself a Mapping of attribute (/column) names to "declared data types". “标题”对象本身是属性(/列)名称到“声明的数据类型”的映射。 I don't know C, but in Java it might be something like Map<AttributeName,TypeName> or Map<AttributeName,Type> or even Map<String,String> (provided you can use those Strings to go get the actual 'Type' objects from wherever they reside). 我不知道C,但在Java中它可能类似于Map <AttributeName,TypeName>或Map <AttributeName,Type>,甚至Map <String,String>(假设您可以使用这些字符串来获取实际的'Type '来自任何地方的物品)。

The set of tuples (/rows) consists of members that are all a Mapping of attribute (/column) names to attribute Values, which are either int or String, in your case. 元组(/行)由成员组成,这些成员都是属性值的映射(/列),属性值为value,在您的情况下为int或String。 Biggest problem here is that this suggests that you need something like Map<AttributeName,Object>, but you might get into trouble over your int's not being an object. 这里最大的问题是,这表明你需要像Map <AttributeName,Object>这样的东西,但你可能会因为你的int不是一个对象而遇到麻烦。

As a generic container for any table rows, I'd most likely use std::vector (as pointed out by Iarsmans). 作为任何表行的通用容器,我很可能使用std::vector (正如Iarsmans所指出的那样)。 As for the table columns, I'd most likely define those with structs representing the table schema. 至于表列,我很可能定义那些带有表示表模式的结构的列。 For example: 例如:

struct DataRow
{
    int col1;
    char col2;
};

typedef std::vector<DataRow> DataTable;
DataTable t;
DataRow dr;
dr.col1 = 1;
dr.col2 = 'a';

t.push_back(dr);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM