简体   繁体   English

我应该使用哪个DB?

[英]Which DB should I use?

I am now building an application which should store and handle large amounts of data. 我现在正在构建一个应该存储和处理大量数据的应用程序。 So now I'm struggling with the question - which DB should I use. 所以现在我正在努力解决这个问题 - 我应该使用哪个DB。

My requirements are: 我的要求是:

  • Handle up to ~100,000 insert commands a second (sometimes several ones from different threads). 每秒处理多达~100,000个插入命令(有时来自不同线程的几个)。 100,000 is the peak; 100,000是最高峰; Most of the time the amount would be between hundreds to a few thousands. 大多数情况下,金额将介于数百到数千之间。
  • Store millions of records. 存储数百万条记录。
  • Query the data as quickly as possible. 尽快查询数据。
  • Part of the data properties change for every entity, which fits non-relational database behavior more than relational ones. 每个实体的部分数据属性都会发生变化,这比非关系数据库行为更适合非关系数据库行为。 However, the sum of possible properties is not huge, so it can be presented as columns in a relational database (if it's much faster this way). 但是,可能属性的总和并不大,因此它可以在关系数据库中显示为列(如果它以这种方式更快)。
  • Update commands will rarely occur. 很少会发生更新命令。

Which DB would you recommend me to use? 你建议我使用哪个DB?

Thanks! 谢谢!

Update: The OS I'm using isn't Windows. 更新:我使用的操作系统不是Windows。 I thought that if SQL Server would be the most recommended DB then I might switch but from your responses, this is not the case. 我认为如果SQL Server是最推荐的数据库,那么我可能会从您的响应切换,但事实并非如此。

Regarding the budget - I will start with the cheapest option and I guess that this will change once the company has more money and more users. 关于预算 - 我将从最便宜的选项开始,我想一旦公司有更多的钱和更多的用户,这将改变。

No one has recommended no-sql databases. 没有人推荐过no-sql数据库。 Are they really that bad for this kind of requirements? 他们真的对这种要求不好吗?

The answer depeneds on asking additional questions, such as how much you want to spend, what OS you are using, and what expertise you have in-house. 答案取决于提出其他问题,例如您想花多少钱,您正在使用什么操作系统以及您在内部拥有哪些专业知识。

Database that I know of that can handle such a massive scale include: DB2, Oracle, Teradata, and SQL Server. 我所知道的可以处理如此大规模的数据库包括:DB2,Oracle,Teradata和SQL Server。 MySQL may also be an option, though I'm not sure of its performance capabilities. MySQL也可能是一个选项,但我不确定它的性能。

There are others, I'm sure, designed for handling data on the massive scale you are suggesting, and you may need to look into those, as well. 我敢肯定,还有其他一些设计用于处理您建议的大规模数据,您可能还需要查看这些数据。

So, if your OS is not Windows, you can exclude SQL Server. 因此,如果您的操作系统不是Windows,则可以排除SQL Server。

If you are going on the cheap, MySQL may be the option. 如果你的便宜,MySQL可能是你的选择。

DB2 and Oracle are both mature database systems. DB2和Oracle都是成熟的数据库系统。 If your system is mainframe (IBM 370), I'd recommend DB2, but for Unix-based either may be an option. 如果您的系统是大型机(IBM 370),我建议使用DB2,但对于基于Unix的系统可能是一种选择。

I don't know much about Teradata, but I know it is specifically designed for massive amounts of data, so may be closer to what you are looking for. 我对Teradata了解不多,但我知道它是专为大量数据而设计的,因此可能更贴近您所寻找的内容。

A more complete list of choices can be found here: http://en.wikipedia.org/wiki/List_of_relational_database_management_systems 可以在此处找到更完整的选择列表: http//en.wikipedia.org/wiki/List_of_relational_database_management_systems

A decent comparason of database here: http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems 这里有一个体面的数据库比较: http//en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

100000+ inserts a second is a huge number, no matter what you choose, you are looking at spending a fortune on hardware to handle this. 每秒100000+次插入是一个巨大的数字,无论你选择什么,你都在寻找在硬件上花大钱来处理这个问题。

This is not a question about what DB to choose, it is a question about your skills and experience. 这不是关于选择什么数据库的问题,而是关于您的技能和经验的问题。

If you think that it is possible with one physical machine - you are on the wrong way. 如果您认为使用一台物理机器是可行的 - 那么您就错了。 If you know that several machines should be used - then why you ask about DB? 如果您知道应该使用多台机器 - 那么为什么要询问DB? DB is not as important as a way you are working with it. 数据库并不像您使用它那样重要。

Start from write-only DB on one server and scale it vertically for now. 从一台服务器上的只写DB开始,现在垂直扩展。 Use several read-only servers and scale them horizontally (here document database can be chosen almost always safely). 使用几个只读服务器并水平扩展(这里几乎可以安全地选择文档数据库)。 CQRS concept is something that will ask on your forthcoming questions. CQRS概念可以询问您即将提出的问题。

"Handle up to ~100,000 insert commands a second" - is this peak, or normal operation? “每秒处理多达~100,000个插入命令” - 这是高峰还是正常操作? If normal operation, your 'millions of records stored' is likely to be billions... 如果正常运行,你的“存储的数百万条记录”很可能是数十亿......

With questions like this, I think it is useful to understand the business 'problem' further - as these are non-trivial requirements! 对于这样的问题,我认为进一步理解业务“问题”是有用的 - 因为这些都是非平凡的要求! The question arises whether the problem justifies this 'brute force' approach, or if there alternative ways of looking at it to achieve the same goal. 问题在于这个问题是否证明了这种“蛮力”方法,或者是否有其他方法可以实现同样的目标。

If it is needed, then you can consider if there are methods of aggregating / transforming data (bulk loading of data / discarding multiple updates to the same record / loading to multiuple databases and then aggregating downstream as a combined set of ETLs perhaps) to make it easier to manage this volume. 如果需要,那么您可以考虑是否存在聚合/转换数据的方法(批量加载数据/将多个更新丢弃到同一记录/加载到多个数据库,然后将下游聚合为ETL的组合)它更容易管理这个卷。

The first thing I would worry about is your disk layout, you are having a mixed workload (OLTP and OLAP) so it is extremely important that your disks are sized and placed correctly in order to achieve this throughput, if your IO sub system can't handle the load it then it doesn't matter what DB you will be using 我要担心的第一件事是你的磁盘布局,你有一个混合工作负载(OLTP和OLAP),所以如果你的IO子系统可以',为了实现这个吞吐量,你的磁盘大小和放置正确​​是非常重要的'处理负载然后无论你将使用什么数据库都无关紧要

In addition perhaps those 100,000 inserts a second can be bulk loaded, btw 100,000 rows a second amounts to 72,000,000 rows in just 12 hours so perhaps you want to store billions of rows? 另外,也许那些100,000次插入每秒可以批量加载,在12小时内每秒100,000行等于72,000,000行所以你可能想存储数十亿行?

You probably can't handle 100k individual insert operations per second, you will certainly need to batch them into a more managable number. 您可能无法每秒处理100k个单独的插入操作,您肯定需要将它们批处理为更可管理的数字。

A single thread wouldn't be able to do that many commands anyway, so I would expect there to be 100-1000 threads doing those inserts. 无论如何,单个线程无法执行那么多命令,所以我希望有100-1000个线程来执行这些插入。

Depending on your app you will probably need some kind of high availability as well. 根据您的应用程序,您可能还需要某种高可用性。 Unless you're doing something like a scientific app. 除非你做的事情像科学的应用程序。

My advice is to hire somebody who has a credible answer for you - ideally someone who's done it before - if you don't know, you're not going to be able to develop the app. 我的建议是雇用一个对你有可靠答案的人 - 最好是之前做过的人 - 如果你不知道,你将无法开发应用程序。 Hire a senior developer who can answer this question. 聘请能够回答这个问题的高级开发人员。 Ask them in their interview if you like. 如果你愿意,可以在面试中询问他们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM