简体繁体 English

如果postgres的性能低，我应该选择哪个db

[英]which db should i select if performance of postgres is low

原文 2008-10-15 16:07:18 8 12 database/ performance/ postgresql

In a web app that support more than 5000 users, postgres is becoming the bottle neck. 在支持超过5000个用户的网络应用程序中，postgres正在成为瓶颈。

It takes more than 1 minute to add a new user.(even after optimizations and on Win 2k3) 添加新用户需要1分钟以上。（即使在优化后和Win 2k3上）

So, as a design issue, which other DB's might be better? 那么，作为一个设计问题，哪个其他DB可能会更好？

12 个解决方案

Most likely, it's not PostgreSQL, it's your design. 最有可能的是，它不是PostgreSQL，而是你的设计。 Changing shoes most likely will not make you a better dancer. 换鞋很可能不会让你成为更好的舞者。

Do you know what is causing slowness? 你知道什么导致缓慢吗？ Is it contention, time to update indexes, seek times? 是争论，更新索引的时间，寻求时间？ Are all 5000 users trying to write to the user table at the same exact time as you are trying to insert 5001st user? 是否所有5000个用户都试图在尝试插入第5001个用户的同一时间写入用户表？ That, I can believe can cause a problem. 那，我相信会导致问题。 You might have to go with something tuned to handling extreme concurrency, like Oracle. 您可能必须使用经过调整以处理极端并发性的内容，例如Oracle。

MySQL (I am told) can be optimized to do faster reads than PostgreSQL, but both are pretty ridiculously fast in terms of # transactions/sec they support, and it doesn't sound like that's your problem. MySQL（我被告知）可以进行优化，以便比PostgreSQL更快地进行读取，但是在它们支持的#transaction / sec方面，两者都非常快，并且听起来不像是你的问题。

PS We were having a little discussion in the comments to a different answer -- do note that some of the biggest, storage-wise, databases in the world are implemented using Postgres (though they tend to tweak the internals of the engine). PS我们在评论中对一个不同的答案进行了一些讨论 - 请注意，世界上一些最大的，存储方面的数据库是使用Postgres实现的（尽管它们倾向于调整引擎的内部）。 Postgres scales for data size extremely well, for concurrency better than most, and is very flexible in terms of what you can do with it. Postgres可以非常好地扩展数据大小，并发性能比大多数更好，并且在使用它时可以非常灵活。

I wish there was a better answer for you, 30 years after the technology was invented, we should be able to make users have less detailed knowledge of the system in order to have it run smoothly. 我希望有更好的答案，在技术发明30年后，我们应该能够让用户对系统的了解不那么详细，以使其顺利运行。 But alas, extensive thinking and tweaking is required for all products I am aware of. 但是，我所知道的所有产品都需要广泛的思考和调整。 I wonder if the creators of StackOverflow could share how they handled db concurrency and scalability? 我想知道StackOverflow的创建者是否可以分享他们如何处理数据库并发和可伸缩性？ They are using SQLServer, I know that much. 他们使用SQLServer，我知道的很多。

PPS So as chance would have it I slammed head-first into a concurrency problem in Oracle yesterday. PPS所以，尽管有机会，我昨天首先陷入了Oracle的并发问题。 I am not totally sure I have it right, not being a DBA, but what the guys explained was something like this: We had a large number of processes connecting to the DB and examining the system dictionary, which apparently forces a short lock on it, despite the fact that it's just a read. 我并不完全确定我是对的，不是DBA，但是这些人解释的是这样的：我们有大量的进程连接到数据库并检查系统字典，这显然会强制对它进行短暂锁定尽管这只是一个阅读。 Parsing queries does the same thing.. so we had (on a multi-tera system with 1000s of objects) a lot of forced wait times because processes were locking each other out of the system. 解析查询会做同样的事情。所以我们（在具有1000个对象的多tera系统上）有很多强制等待时间，因为进程彼此锁定在系统之外。 Our system dictionary was also excessively big because it contains a separate copy of all the information for each partition, of which there can be thousands per table. 我们的系统字典也过大，因为它包含每个分区的所有信息的单独副本，每个分区可以有数千个。 This is not really related to PostgreSQL, but the takeaway is -- in addition to checking your design, make sure your queries are using bind variables and getting reused, and pressure is minimal on shared resources. 这与PostgreSQL并不真正相关，但重要的是 - 除了检查您的设计外，确保您的查询使用绑定变量并重新使用，并且共享资源的压力最小。

请更改运行Postgres的操作系统 - Windows端口虽然对扩展用户群非常有用，但仍然与（更老的和更成熟的）Un * x端口（尤其是Linux端口）不相上下。

Ithink your best choice is still PostgresSQL. 我认为你最好的选择仍然是PostgresSQL。 Spend the time to make sure you have properly tuned your application. 花些时间确保您已正确调整应用程序。 After your confident you have reached the limits of what can be done with tuning, start cacheing everything you can. 在您确信已经达到调整可以完成的限制后，开始缓存所有可能的内容。 After that, start think about moving to an asynchronous master slave setup...Also are you running OLAP type functionality on the same database your doing OLTP on? 之后，开始考虑转移到异步主从设置...还在运行OLTP的同一数据库上运行OLAP类型功能吗？

Let me introduce you to the simplest, most practical way to scale almost any database server if the database design is truly optimal: just double your ram for an instantaneous boost in performance. 让我向您介绍几乎所有数据库服务器的最简单，最实用的方法，如果数据库设计真正最佳：只需加倍您的内存即可提高性能。 It's like magic. 这就像魔术一样。

PostgreSQL scales better than most, if you are going to stay with a relational db, Oracle would be it. PostgreSQL比大多数扩展得更好，如果你要继续使用关系数据库，Oracle就是这样。 ODBMS scale better but they have their own issues, as in that it is closer to programming to set one up. ODBMS规模更好，但它们有自己的问题，因为它更接近编程设置一个。
Yahoo uses PostgreSQL , that should tell you something about is scalability. 雅虎使用PostgreSQL ，应该告诉你一些可扩展性。

As highlighted above the problem is not with the particular database you are using, ie PostgreSQL but one of the following: 如上所述，问题不在于您使用的特定数据库，即PostgreSQL，而是以下之一：

Schema design, maybe you need to add, remove, refine your indexes 架构设计，您可能需要添加，删除，优化索引
Hardware maybe you are asking to much of your server - you said 5k users but then again very few of them are probably querying the db at the same time 硬件也许你问你的服务器很多 - 你说的是5k用户但是很少有人可能同时查询数据库
Queries: perhaps poorly defined resulting in lots of inefficiency 查询：可能定义不清，导致效率低下

A pragmatic way to find out what is happening is to analyse the PostgeSQL log files and find out what queries in terms of: 找出正在发生的事情的实用方法是分析PostgeSQL日志文件并找出以下方面的查询：

Most frequently executed 最常执行
Longest running 运行时间最长
etc. etc. 等等

A quick review will tell you where to focus your efforts and you will most likely resolve your issues fairly quickly. 快速审核将告诉您在哪里集中精力，您很可能会很快解决您的问题。 There is no silver bullet, you have to do some homework but this will be small compared to changing your db vendor. 没有灵丹妙药，你必须做一些功课，但与改变你的数据库供应商相比，这个功能会很小。

Good news ... there are lots of utilities to analayse your log files that are easy to use and produce easy to interpret results, here are two: 好消息...有很多实用程序可以让你的日志文件易于使用并产生易于理解的结果，这里有两个：

pgFouine - a PostgreSQL log analyzer (PHP) pgFouine - 一个PostgreSQL日志分析器（PHP）

pgFouine: Sample report pgFouine：样本报告

PQA (ruby) PQA（红宝石）

PQA: Sample report PQA：样本报告

First, I would make sure the optimizations are, indeed, useful. 首先，我会确保优化确实有用。 For example, if you have many indexes, sometimes adding or modifying a record can take a long time. 例如，如果您有许多索引，有时添加或修改记录可能需要很长时间。 I know there are several big projects running over PostgreSQL, so take a look at this issue. 我知道有几个大项目在PostgreSQL上运行，所以看看这个问题。

I'd suggest looking here for information on PostgreSQL's performance: http://enfranchisedmind.com/blog/2006/11/04/postgres-for-the-win 我建议在这里查看有关PostgreSQL性能的信息： http ： //enfranchisedmind.com/blog/2006/11/04/postgres-for-the-win

What version of PG are you running? 你在运行什么版本的PG？ As the releases have progressed, performance has improved greatly. 随着版本的发展，性能已大大提高。

Hi had the same issue previously with my current company. 您以前和我现在的公司有同样的问题。 When I first joined them, their queries were huge and very slow. 当我第一次加入他们时，他们的查询非常庞大且非常缓慢。 It takes 10 minutes to run them. 运行它们需要10分钟。 I was able to optimize them to a few milliseconds or 1 to 2 seconds. 我能够将它们优化到几毫秒或1到2秒。 I have learned many things during that time and I will share a few highlights in them. 在那段时间里我学到了很多东西，我将在其中分享一些亮点。

Check your query first. 首先检查您的查询。 doing an inner join of all the tables that you need will always take sometime. 进行所需的所有表的内部连接总是需要一段时间。 One thing that I would suggest is always start off with the table with which you can actually cut your data to those that you need. 我建议的一件事就是从表格开始，您可以将数据切换到您需要的数据。
eg SELECT * FROM (SELECT * FROM person WHERE person ilike '%abc') AS person; 例如SELECT * FROM（SELECT * FROM person WHERE person ilike'％abc'）AS person;

If you look at the example above, this will cut your results to something that you know you need and you can refine them more by doing an inner join. 如果你看一下上面的例子，这会将你的结果剪切成你知道你需要的东西，你可以通过内部连接来更多地优化它们。 This is one of the best way to speed up your query but there are more than one way to skin a cat. 这是加快查询速度的最佳方法之一，但有一种方法可以让猫皮肤变色。 I cannot explain all of them here because there are just too many but from the example above, you just need to modify that to suite your need. 我无法在这里解释所有这些，因为有太多，但从上面的例子，你只需要修改它来满足你的需要。

It depends on your postgres version. 这取决于你的postgres版本。 Older postgres does internally optimize the query. 较旧的postgres在内部优化查询。 on example is that on postgres 8.2 and below, IN statements are slower than 8.4's. 例如，在postgres 8.2及以下版本中，IN语句比8.4更慢。
EXPLAIN ANALYZE is your friend. EXPLAIN ANALYZE是你的朋友。 if your query is running slow, do an explain analyze to determine which of it is causing the slowness. 如果您的查询运行缓慢，请执行解释分析以确定哪个导致缓慢。
Vacuum your database. 吸尘你的数据库。 This will ensure that statistics on your database will almost match the actual result. 这将确保数据库的统计信息几乎与实际结果相匹配。 Big difference in the statistics and actual will result on your query running slow. 统计信息和实际情况的巨大差异将导致查询运行缓慢。
If all of these does not help you, try modifying your postgresql.conf. 如果所有这些对您没有帮助，请尝试修改postgresql.conf。 Increase the shared memory and try to experiment with the configuration to better suite your needs. 增加共享内存并尝试使用配置来更好地满足您的需求。

Hope this helps, but of course, these are just for postgres optimization. 希望这会有所帮助，但当然，这些仅适用于postgres优化。

btw. 顺便说一句。 5000 users are not much. 5000个用户并不多。 My DB contains users with about 200k to a million users. 我的数据库包含大约20万到100万用户的用户。

If you do want to switch away from PostgreSQL, Sybase SQL Anywhere is number 5 in terms of price/performance on the TPC-C benchmark list . 如果您确实希望从PostgreSQL切换，则Sybase SQL Anywhere在TPC-C基准列表的价格/性能方面排名第五。 It's also the lowest price option (by far) on the top 10 list, and is the only non-Microsoft and non-Oracle entry. 它也是前10名列表中的最低价格选项（到目前为止），并且是唯一的非Microsoft和非Oracle条目。

It can easily scale to thousands of users and terabytes of data. 它可以轻松扩展到数千个用户和数TB的数据。

Full disclosure: I work on the SQL Anywhere development team. 完全披露：我在SQL Anywhere开发团队工作。

We need more details: What version you are using? 我们需要更多详细信息：您使用的是哪个版本？ What is the memory usage of the server? 服务器的内存使用情况是多少？ Are you vacuuming the database? 你在吸尘数据库吗？ Your performance problems might be un-related to PostgreSQL. 您的性能问题可能与PostgreSQL无关。

If you have many reads over writes, you may want to try MySQL assuming that the problem is with Postgres, but your problem is a write problem. 如果你有很多读写，你可能想尝试MySQL假设问题是Postgres，但你的问题是写问题。

Still, you may want to look into your database design, and possibly consider sharding. 不过，您可能希望查看数据库设计，并可能考虑进行分片。 For a really large database you may still have to look at the above 2 issues regardless. 对于一个非常大的数据库，您可能仍然需要查看上述2个问题。

You may also want to look at non-RDBMS database servers or document oriented like Mensia and CouchDB depending on the task at hand. 您可能还希望查看非RDBMS数据库服务器或面向Mensia和CouchDB的文档，具体取决于手头的任务。 No single tool will manage all tasks, so choose wisely. 没有一个工具可以管理所有任务，因此明智地选择。

Just out of curiosity, do you have any stored procedures that may be causing this delay? 出于好奇，您是否有任何可能导致此延迟的存储过程？