简体   繁体   English

将数据库数据存储在文件中?

[英]Storing database data in files?

I'm currently working on a school project, in java, and I'm coding a database application. 我目前正在研究java中的学校项目,我正在编写数据库应用程序。 Something like the MySQL Monitor where you type in queries and get results / whatever. 类似于MySQL监视器,您可以在其中输入查询并获得结果/等等。

In applications I've coded before, I used databases to store data, like user profiles, settings, etc. Now, obviously, I can't use a database to store data generated from this school project, otherwise what's the point? 在我之前编写的应用程序中,我使用数据库来存储数据,例如用户配置文件,设置等。现在,显然,我不能使用数据库来存储从这个学校项目生成的数据,否则有什么意义呢?

I'm thinking about storing the data in files but that's the only idea I have in my mind right now and I'm kinda running dry.. and to be honest, I don't want to start banging at code and then I discover a better way of doing it. 我正在考虑将数据存储在文件中,但这是我现在脑海中唯一的想法,而且我有点干嘛...说实话,我不想开始敲打代码然后我发现这样做的更好方法。

So if anyone has any idea how to store the data (like CSV?), or has some kind of knowledge of how database applications work internally, can you please shed some light? 因此,如果有人知道如何存储数据(如CSV?),或者对数据库应用程序如何在内部工作有一些知识,请你解释一下吗?

-- EDIT: just to be more clear, I can't use database engines to store the data, to put it this way, I'm coding a simple database engine. - 编辑:为了更清楚,我不能使用数据库引擎来存储数据,就这样说,我编写了一个简单的数据库引擎。 Ideas like what Galwegian, jkramer and Joe Skora suggested is what I'm looking for. 像Galwegian,jkramer和Joe Skora所建议的想法正是我在寻找的。

Sure, you could create your own database with a file system since that is how actual databases are implemented. 当然,您可以使用文件系统创建自己的数据库,因为这是实际数据库的实现方式。 For example, you could decide to store your data in fixed or variable length raw data files, and then create a separate index file with file pointers into that other file for quick indexed access for any queries based on what type of index information you want stored in your Index file 例如,您可以决定将数据存储在固定或可变长度的原始数据文件中,然后创建一个单独的索引文件,其中包含指向该另一个文件的文件指针,以便根据您希望存储的索引信息类型对任何查询进行快速索引访问在您的索引文件中

So yes, look at creating 2 files - 1 to store the data and the other to store file pointers into that file keyed by whatever indexes you are wanting to provide quick index access by. 所以是的,看看创建2个文件 - 1用于存储数据,另一个用于将文件指针存储到该文件中,该文件由您希望提供快速索引访问的任何索引键入。

Best of luck - you will come to learn alot about database construction with this project I am betting. 祝您好运 - 您将通过我投注的这个项目来学习很多关于数据库构建的知识。

What you probably want is to use are random access files . 你可能想要的是使用随机访问文件 Once you have a set of fields for a record, you can write them to disk as a block. 为记录创建一组字段后,可以将它们作为块写入磁盘。 You can keep an index separately on disk on in memory and access any record directly at any time. 您可以将索引分别保存在内存中的磁盘上,并可以随时直接访问任何记录。 Hopefully that gives you enough to get started. 希望这足以让你开始。

我不确定我理解你的要求,但不会' SQLite '为你工作(虽然它仍然是一个数据库引擎,这是你可能首先避免的,所以我不太确定)?

I would create a database that uses binary tables, one file per table. 我会创建一个使用二进制表的数据库,每个表一个文件。 Take a look at the very handy DataInputStream and DataOutputStream classes. 看看非常方便的DataInputStream和DataOutputStream类。 Using them you can easily go back and forth from binary files to Java types. 使用它们,您可以轻松地从二进制文件到Java类型来回转换。

I would define a simple structure for the table: a header that describes the contents of the table, followed by the row data. 我将为表定义一个简单的结构:一个描述表内容的标题,后跟行数据。 Have each column in the table defined in the header - its name, data type, and maximum length. 在标头中定义表中的每一列 - 其名称,数据类型和最大长度。 Keep it simple. 把事情简单化。 Only handle a few data types using the capabilities of DataInput/OutputStream as your guide. 仅使用DataInput / OutputStream的功能作为指南处理一些数据类型。 Use a simple file-naming convention to associate table names to file names. 使用简单的文件命名约定将表名与文件名关联。

Create a test table with enough columns to have at least one of each data type. 创建一个包含足够列的测试表,以使每种数据类型至少有一种。 Then, create a simple way to populate tables with data, either by processing input files or via console input. 然后,通过处理输入文件或通过控制台输入,创建一种使用数据填充表的简单方法。 Finally, create a simple way to display the contents of entire tables to the console. 最后,创建一种将整个表的内容显示到控制台的简单方法。

After that, you can add on a very simple version of a SQL-like dialect to do queries. 之后,您可以添加一个非常简单的类似SQL的方言来进行查询。 A simple query like this: 像这样的简单查询:

SELECT * FROM EMPLOYEES

...would require opening up the file containing the EMPLOYEES table (via your table filename naming convention), parsing the header, and reading through the entire table, returning the contents. ...需要打开包含EMPLOYEES表的文件(通过表文件名命名约定),解析标题,并读取整个表,返回内容。

After you get that working, it will be simple to add other functionality such as processing of simple WHERE clauses, returning only the rows (or columns within rows) that match certain criteria. 完成后,添加其他功能(如处理简单的WHERE子句,仅返回符合特定条件的行(或行内的列))将很简单。

If it's not necessary to have such a general-purpose solution (any number of tables, any number of columns, an actual query language, etc.) you can simply add methods to your API like: 如果没有必要使用这样的通用解决方案(任意数量的表,任意数量的列,实际查询语言等),您只需向API添加方法,如:

Employee[] result = EmployeeDataManager.select("LASTNAME", "Smith");

...or something like that. ...或类似的东西。 If you build up slowly, dividing your functionality up into several small tasks as I have suggested, soon you will have implemented all of the features you need. 如果你慢慢积累,按照我的建议将你的功能分成几个小任务,很快就会实现你需要的所有功能。

I suppose you could do a very simple proof of principle 'database' application using xml files and maybe use xpath to query it. 我想你可以使用xml文件做一个非常简单的原理'数据库'应用程序的证明,也许可以使用xpath来查询它。

Would be very slow compared to a database (depending on file size and hardware of course), but would work. 与数据库相比会非常慢(当然,取决于文件大小和硬件),但是可行。

The basics of storing records in blocks in data files have been around for decades. 将记录存储在数据文件中的块的基础已经存在了几十年。 Obviously there are a great many variations on a theme, and all of them are designed to work around the fact that we have slow disk drives. 显然,主题有很多变化,所有这些都是为了解决我们的磁盘驱动器速度慢的问题。

But the fundamentals are not difficult. 但基本面并不困难。 Combining fixed length columns with a fixed number of columns can give you very rapid access to any record in your database. 将固定长度列与固定数量的列组合可以非常快速地访问数据库中的任何记录。

From there, it's all offsets. 从那里,它是所有抵消。

Let's take the example of a simple row containing 10 32-Bit integers. 我们以一个包含10个32位整数的简单行为例。 A single row would be 40 bytes (4 bytes per integer * 10). 单行将是40个字节(每个整数4个字节* 10)。 If you want row 123, simply multiply it by 40. 123 * 40, gives you an offset of 4920. Seek that far in to the database file, read 40 bytes, and voila, you have a row from your database. 如果你想要第123行,只需将它乘以40. 123 * 40,给你一个偏移量4920.向数据库文件寻找那么远,读取40个字节,瞧,你的数据库中有一行。

Indexes are stored in B+-Trees, with tree nodes distributed across blocks on the disk. 索引存储在B + -Trees中,树节点分布在磁盘上的块中。 The power of the B+Tree is that you can easily find a single key value within the tree, and then simply walk the leaf nodes to scroll through the data in key order. B + Tree的强大之处在于您可以轻松地在树中找到单个键值,然后简单地遍历叶节点以按键顺序滚动数据。

For a simple format that's useful and popular, consider looking up the original DBase format -- DBF Files. 对于一个有用且流行的简单格式,请考虑查找原始DBase格式--DBF文件。 It's evolved some over the years, but the foundation is quite simple, well documented, and there are lots of utilities that can work on it. 多年来它已经发展了一些,但基础非常简单,文档齐全,并且有许多实用程序可以使用它。 It's a perfectly workable database format that deals with all of the fundamental issues with the problem. 这是一个完美可行的数据库格式,可以处理问题的所有基本问题。

如果您正在使用C#,您可以考虑将简单的linq编写为xml类型的ORM。

You could use a serialization format like YAML, and store an array of hashes, where each hash is a table record and the keys in each hash are column names. 您可以使用像YAML这样的序列化格式,并存储哈希数组,其中每个哈希都是一个表记录,每个哈希中的键都是列名。 You could then just load the serialized file into memory, work with arrays and hashes, and then store everything back. 然后,您可以将序列化文件加载到内存中,使用数组和散列,然后将所有内容存储回来。

I hope that's what you meant. 我希望这就是你的意思。

Can't you use a file based database like hsqldb to store your user settings etc.? 你不能使用像hsqldb这样的基于文件的数据库来存储你的用户设置等吗? This way you have a familiar interface to your data and are able to store it in the filesystem. 这样,您就拥有了熟悉的数据接口,并能够将其存储在文件系统中。

StackOverflow isn't for homework. StackOverflow不适用于家庭作业。

Having said that, here's the Quick and Dirty way to an efficient, flexible database. 话虽如此,这是一个高效,灵活的数据库的快速和肮脏的方式。

  1. Design a nice Map (HashMap, TreeMap, whatever) that does what you want to do. 设计一个很好的Map(HashMap,TreeMap,无论如何)来完成你想做的事情。 Often, you'll have a "Record" class with your data, and a number of "Index" objects which are effectively Map<String,List<Record>> collections. 通常,您将拥有一个包含数据的“Record”类,以及一些有效的Map<String,List<Record>>集合的“Index”对象。 (Why a list of records? What about an index on a not-very-selective field?) (为什么是一个记录列表?关于非选择性字段的索引怎么样?)

  2. Write a class to serialize your collections into files. 编写一个类来将您的集合序列化为文件。

  3. Write a class to deserialize your collections from files. 编写一个类来从文件中反序列化集合。

  4. Write your query processing or whatever around the in-memory Java objects. 在内存中的Java对象周围编写查询处理或其他内容。

In-memory database. 内存数据库。

Don't like Java's serialization? 不喜欢Java的序列化? Get a JSON or YAML library and use those formats to serialize and deserialize. 获取JSON或YAML库并使用这些格式进行序列化和反序列化。

"But an in-memory database won't scale," the purists whine. “但内存数据库无法扩展,”纯粹主义者抱怨道。 Take that up with SQLite, not me. 用SQLite来解决这个问题,而不是我。 My PC has 2GB of RAM, that's a pretty big database. 我的电脑有2GB的RAM,这是一个非常大的数据库。 SQLite works. SQLite有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM