简体繁体 English

什么是保存大量顺序数据的最佳方法

[英]What is the best method to save large amounts of sequential data

原文 2015-04-30 14:17:59 5 1 sql-server/ database-design/ nosql

I tried but couldn't find a similar post, I apologize if I have missed a post and made a duplicate here. 我尝试过但找不到类似的帖子，如果我错过了一个帖子并在此处重复了，我深表歉意。

I need to find the best mechanism to save data for my following requirement and thought to get your opinion. 我需要找到一种最佳的机制来保存数据以满足我的以下要求，并想得到您的意见。

The main requirement 主要要求

We receive a lot of data from a collection of electronic sensors. 我们从电子传感器的集合中收到大量数据。 The amount of data is about 50,000 records per second and each record contains a floating point value and a date/time stamp. 数据量约为每秒50,000条记录，每条记录包含一个浮点值和一个日期/时间戳。

Also, we need to keep this data for at least 5 years and process them to make predictions. 另外，我们需要将该数据保存至少5年并对其进行处理以进行预测。

Currently we are using MS Sql server but we are very keen to explore into new areas like NO SQL. 当前，我们正在使用MS Sql服务器，但我们非常热衷于探索NO SQL等新领域。

We can be are flexible on these 我们可以在这些方面保持灵活

we wouldn't need a great deal of consistency as the structure of data is very simple 我们不需要大量的一致性，因为数据的结构非常简单
we can manage atomicity from code when saving (if required) 保存时我们可以通过代码管理原子性（如果需要）

We would need the DB end to be reliable on these 我们需要数据库端在这些方面可靠

Fast retrieval - so that it won't add much time to what's already required by heavy prediction algorithms 快速检索-这样就不会为繁重的预测算法增加很多时间
Reliability when saving - our middle tier will have to throw a lot of data at a high speed and hope the db could save all. 保存时的可靠性-我们的中间层将不得不高速抛出大量数据，并希望数据库可以保存所有数据。
Data need to be safe (durability) 数据需要安全（耐用性）

I have been reading on this and I am beginning to wonder if we could use both MS SQL and NO SQL in conjunction. 我一直在阅读，我开始怀疑我们是否可以同时使用MS SQL和NO SQL。 What I am thinking of is continue using MS SQL for regular use of data and use a NO SQL solution for long term storage/processing. 我在想的是继续使用MS SQL定期使用数据，并使用NO SQL解决方案进行长期存储/处理。

As you may have realized by now I am very new to No SQL. 您可能已经意识到，现在我对No SQL还是很陌生。

What do you think is the best way to store this much data while retaining the performance and accuracy? 您认为在保持性能和准确性的同时存储大量数据的最佳方法是什么？

I would be very grateful if you could shed some light on this so we can provide an efficient solution to this problem. 如果您能对此有所了解，我们将不胜感激，以便我们为您提供有效的解决方案。

We are also thinking about eliminating almost identical records that arrive close to each other (eg 45.9344563V, 45.9344565V, 45.9344562V arrived within 3 microseconds - We will ignore first 2 and take the third). 我们也在考虑消除几乎相同的 ，彼此接近的记录（例如，在3微秒内到达的45.9344563V，45.9344565V，45.9344562V-我们将忽略前两个，取第三个）。 Have any of you solved similar problem before, any algorithms you used? 你们有没有解决过类似的问题，使用过任何算法？

I am not trying to get a complete solution here. 我不是在这里寻求完整的解决方案。 Just trying to start a dialog with other professionals out there... please give your opinion. 只是尝试与其他专业人员进行对话...请发表您的意见。

Many thanks for your time, your opinion is greatly appreciated! 非常感谢您的宝贵时间，我们非常感谢您的意见！

1 个解决方案

NoSQL is pretty cool and will handle one of your requirements well (quick storage and non-relational retrieval). NoSQL非常酷，可以很好地处理您的需求之一（快速存储和非关系检索）。 However, the problem with NoSQL ends up becoming what to do when you start trying to use the data relationally, where it won't really perform quite as well as an RDBMS. 但是，NoSQL的问题最终变成了当您开始尝试使用关系型数据时该怎么办，而它实际上并没有RDBMS那样好。

When storing large quantities of data in an RDBMS, there are several strategies you can use to handle large quantities of data. 在RDBMS中存储大量数据时，可以使用多种策略来处理大量数据。 The most obvious one coming to mind is using Partitions. 我想到的最明显的一种方法是使用分区。 You can read more about that for SQL Server here: https://msdn.microsoft.com/en-us/library/ms190787.aspx 您可以在此处阅读有关SQL Server的更多信息： https : //msdn.microsoft.com/zh-cn/library/ms190787.aspx

You might also want to consider creating a job to periodically move historical data that isn't accessed as often to a separate disk. 您可能还需要考虑创建作业，以将不常访问的历史数据定期移动到单独的磁盘上。 This may enable you to use a new feature in SQL Server 2014 called in memory OLTP for the more heavily used recent data (assuming it's under 250gb): https://msdn.microsoft.com/en-us/library/dn133186.aspx 这可能使您能够使用SQL Server 2014中的一项新功能（称为内存OLTP）来处理更频繁使用的最新数据（假设它的容量低于250gb）： https ://msdn.microsoft.com/zh-cn/library/dn133186.aspx