简体   繁体   English

在数据库中存储大量数据

[英]Storing large amount of data in database

I have question regarding the storage of large amount of data. 我对存储大量数据有疑问。 The situation is the following: 情况如下:

  1. I want to store 我想存放

    • GPS Coordinates(latitude and longitude) (every minute or even less interval, but I'm considering every minute) GPS坐标(经度和纬度)(每分钟甚至更短的间隔,但我正在考虑每分钟)
    • Event, which can be repeated for several coordinates 事件,可以重复几个坐标
    • Datetime or timestamp of entry(dunno which is better to use in my case) 入口的日期时间或时间戳(在我的情况下最好使用的dunno)
    • (user id) (用户身份)
  2. I want to be able to query: 我希望能够查询:

    • Event by zone(defining the range of latitude and longitude, for example from (1,1) to (2,2)) 按区域划分事件(定义纬度和经度范围,例如从(1,1)到(2,2))
    • User tracking from date X to date Y (one or more users) 用户从日期X跟踪到Y(一个或多个用户)

So far I was thinking on the solution: 到目前为止,我一直在考虑解决方案:

Solution 1 解决方案1

id_user (int)
id_experince (int)
id_event (int)
dt (datetime)
latitude (decimal)
longitude (decimal)

I started to do some calculations and that would be something like: - around 500 entries per day/user - since I'm preparing application for some load, there can be around 100-150 users, which will be 75000 entries/day - after one month there will be millions of entries 我开始做一些计算,这类似于: - 每天大约500个条目/用户 - 因为我正准备申请一些负载,可以有大约100-150个用户,这将是75000个条目/天 - 之后一个月将有数百万条目

Probably, Solution 1 is not good solution, since the size of database with grow very fast. 可能解决方案1并不是一个好的解决方案,因为数据库的大小增长非常快。

Solution 2 解决方案2

Have 2 tables, one of which will be aggregate coordinates according to event, for example I have event "dinner" and it takes 30 minutes, so 30 entries will be grouped in one field with BLOB type. 有2个表,其中一个是根据事件的聚合坐标,例如我有事件“晚餐”,它需要30分钟,所以30个条目将被分组在一个BLOB类型的字段中。 This table will look like: 该表格如下:

id_user (int)
id_experience (int)
id_event (int)
dt (datetime)
coordinates(blob)

And another table, which have have calculated locations with some "width" and "length", having pointer to the first table 另一个表已经计算了具有一些“宽度”和“长度”的位置,具有指向第一个表的指针

latitude (decimal)
longitude (decimal)
id_entry_in_first_table (int)

This solution only partially solves my problem, imagine, that some events will not be more several minutes and there is a need for the second database.. 这个解决方案只能部分地解决我的问题,想象一下,有些事件不会超过几分钟,而且需要第二个数据库。

Solution 3 解决方案3

This is probably not very correct solution, but it seems making some sense. 这可能不是非常正确的解决方案,但似乎有道理。 I have user associated with some kind of experience, which has start date and end date. 我有一些与某种体验相关的用户,它有开始日期和结束日期。 When experience adds, I will create dump of data for that experience and save to the file, deleting the entries related to the experience. 当经验添加时,我将为该体验创建数据转储并保存到文件中,删除与体验相关的条目。 When the user will want to consult "archived" experience, I will load data into some temporary table and drop it within one day(for example), in this case I will save the data according to the solution 1. 当用户想要查阅“存档”体验时,我会将数据加载到某个临时表中并在一天内删除(例如),在这种情况下,我将根据解决方案1保存数据。

The main question is: are any of the presented solutions acceptable in terms of performance of the database? 主要问题是:在数据库性能方面是否可以接受任何提出的解决方案? Is there any better solution for my problem? 我的问题有更好的解决方案吗?

"Millions of entries" sounds like a lot, but this is what databases are designed to handle. “数以百万计的条目”听起来很多,但这就是数据库的设计目标。 However you design it, if you optimise it according to how you want to extract results from it later (as thats what will take the time as opposed to the inserts) then you're good to go. 无论你如何设计它,如果你根据你以后想要从中提取结果的方式对它进行优化(那就是花费时间而不是插入的那些)那么你很高兴。

Saying that of course... if you have lots of users doing lots of things at the same time to your database then I think your server/bandwidth with go before your database does! 当然要说...如果你有很多用户同时在你的数据库中做很多事情,那么我认为你的服务器/带宽在你的数据库之前去了!

I would choose a master detail approach. 我会选择一个主要的细节方法。

Two advantages: 两个优点:

  1. Yo don't have redundant entries (1 master row and x child rows with coordinates) 哟没有多余的条目(1个主行和x个子行与坐标)

  2. It is still easy to query (in contrast to the blob approach). 它仍然很容易查询(与blob方法相反)。

     SELECT m.id_user, m.id_experince, m.id_event, c.latitude, c.longitude FROM master_table m LEFT JOIN child_table c ON m.id = c.master_table_id 

And this should be pretty fast even with many millions of records in the master table, if you setup a foreign key or index on master_table_id 如果在master_table_id上​​设置外键或索引,即使主表中有数百万条记录,这应该非常快

You probably want to read this: http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html . 您可能希望阅读此内容: http//dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html

Broadly speaking, as long as you can use indexes in your queries, huge tables aren't an issue - billions of records can be queried on consumer grade laptops. 从广义上讲,只要您可以在查询中使用索引,巨大的表就不是问题 - 可以在消费级笔记本电脑上查询数十亿条记录。 You should have an archiving strategy if you intend to scale to huge numbers of historical records, but it's not a huge priority. 如果您打算扩展到大量的历史记录,您应该有一个归档策略,但这不是一个重要的优先事项。

Far more tricky is to support your desire to find events within a certain geographic boundary; 更棘手的是支持你在某个地理边界内寻找事件的愿望; it's easy for this to break your indexing strategy in all sorts of nasty ways. 这很容易以各种令人讨厌的方式打破你的索引策略。 If you have to query based on mathematical operations, you may not be able to use an index - so finding users within a radius of a 1 mile circle might have to evaluate the circle formula for every record in your database table. 如果必须基于数学运算进行查询,则可能无法使用索引 - 因此查找半径为1英里的圆圈内的用户可能必须为数据库表中的每条记录评估圆形公式。

The spatial extensions offer a solution for this - but they're not "free", you have to optimize your design specifically for this. 空间扩展为此提供了解决方案 - 但它们不是“免费”的,您必须专门为此优化您的设计。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM