简体   繁体   中英

What is the best method to save large amounts of sequential data

I tried but couldn't find a similar post, I apologize if I have missed a post and made a duplicate here.

I need to find the best mechanism to save data for my following requirement and thought to get your opinion.

The main requirement

We receive a lot of data from a collection of electronic sensors. The amount of data is about 50,000 records per second and each record contains a floating point value and a date/time stamp.

Also, we need to keep this data for at least 5 years and process them to make predictions.

Currently we are using MS Sql server but we are very keen to explore into new areas like NO SQL.

We can be are flexible on these

  • we wouldn't need a great deal of consistency as the structure of data is very simple
  • we can manage atomicity from code when saving (if required)

We would need the DB end to be reliable on these

  • Fast retrieval - so that it won't add much time to what's already required by heavy prediction algorithms
  • Reliability when saving - our middle tier will have to throw a lot of data at a high speed and hope the db could save all.
  • Data need to be safe (durability)

I have been reading on this and I am beginning to wonder if we could use both MS SQL and NO SQL in conjunction. What I am thinking of is continue using MS SQL for regular use of data and use a NO SQL solution for long term storage/processing.

As you may have realized by now I am very new to No SQL.

What do you think is the best way to store this much data while retaining the performance and accuracy?

I would be very grateful if you could shed some light on this so we can provide an efficient solution to this problem.

We are also thinking about eliminating almost identical records that arrive close to each other (eg 45.9344563V, 45.9344565V, 45.9344562V arrived within 3 microseconds - We will ignore first 2 and take the third). Have any of you solved similar problem before, any algorithms you used?

I am not trying to get a complete solution here. Just trying to start a dialog with other professionals out there... please give your opinion.

Many thanks for your time, your opinion is greatly appreciated!

NoSQL is pretty cool and will handle one of your requirements well (quick storage and non-relational retrieval). However, the problem with NoSQL ends up becoming what to do when you start trying to use the data relationally, where it won't really perform quite as well as an RDBMS.

When storing large quantities of data in an RDBMS, there are several strategies you can use to handle large quantities of data. The most obvious one coming to mind is using Partitions. You can read more about that for SQL Server here: https://msdn.microsoft.com/en-us/library/ms190787.aspx

You might also want to consider creating a job to periodically move historical data that isn't accessed as often to a separate disk. This may enable you to use a new feature in SQL Server 2014 called in memory OLTP for the more heavily used recent data (assuming it's under 250gb): https://msdn.microsoft.com/en-us/library/dn133186.aspx

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM