简体   繁体   English

postgres:从表中获取随机条目 - 太慢了

[英]postgres: get random entries from table - too slow

In my postgres database, I have the following relationships (simplified for the sake of this question): 在我的postgres数据库中,我有以下关系(为了这个问题简化):

Objects (currently has about 250,000 records)
-------
n_id
n_store_object_id (references store.n_id, 1-to-1 relationship, some objects don't have store records)
n_media_id (references media.n_id, 1-to-1 relationship, some objects don't have media records)

Store (currently has about 100,000 records)
-----
n_id
t_name,
t_description,
n_status,
t_tag

Media
-----
n_id
t_media_path

So far, so good. 到现在为止还挺好。 When I need to query the data, I run this (note the limit 2 at the end, as part of the requirement): 当我需要查询数据时,我运行它(注意最后的limit 2 ,作为要求的一部分):

select
    o.n_id,
    s.t_name,
    s.t_description,
    me.t_media_path
from
    objects o
    join store s on (o.n_store_object_id = s.n_id and s.n_status > 0 and s.t_tag is not null)
    join media me on o.n_media_id = me.n_id
limit
    2

This works fine and gives me two entries back, as expected. 这工作正常,并按预期返回两个条目。 The execution time on this is about 20 ms - just fine. 这个执行时间大约是20毫秒 - 就好了。

Now I need to get 2 random entries every time the query runs. 现在,每次查询运行时我都需要输入2个随机条目。 I thought I'd add order by random() , like so: 我以为我会order by random()添加order by random() ,如下所示:

select
    o.n_id,
    s.t_name,
    s.t_description,
    me.t_media_path
from
    objects o
    join store s on (o.n_store_object_id = s.n_id and s.n_status > 0 and s.t_tag is not null)
    join media me on o.n_media_id = me.n_id
order by
    random()
limit
    2

While this gives the right results, the execution time is now about 2,500 ms (over 2 seconds). 虽然这给出了正确的结果,但执行时间现在约为2,500毫秒(超过2秒)。 This is clearly not acceptable, as it's one of a number of queries to be run to get data for a page in a web app. 这显然是不可接受的,因为它是为了在Web应用程序中获取页面数据而运行的大量查询之一。

So, the question is: how can I get random entries, as above, but still keep the execution time within some reasonable amount of time (ie under 100 ms is acceptable for my purpose)? 所以,问题是:如何获得随机条目,如上所述,但仍然将执行时间保持在一段合理的时间内(即100毫秒以下是否可以接受)?

Of course it needs to sort the whole thing according to random criteria before getting first rows. 当然,它需要在获得第一行之前根据随机标准对整个事物进行排序。 Maybe you can work around by using random() in offset instead? 也许你可以通过在offset使用random()来解决这个问题?

Here's some previous work done on the topic which may prove helpful: 以下是关于该主题的一些以前的工作可能会有所帮助:

http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/ http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/

I'm thinking you'll be better off selecting random objects first, then performing the join to those objects after they're selected. 我想你最好先选择随机对象,然后在选择后对这些对象进行连接。 Ie, query once to select random objects, then query again to join just those objects that were selected. 即,查询一次以选择随机对象,然后再次查询以仅连接所选择的那些对象。

It seems like your problem is this: You have a table with 250,000 rows and need two random rows. 看起来你的问题是:你有一个250,000行的表,需要两个随机行。 Thus, you have to generate 250,000 random numbers and then sort the rows by their numbers. 因此,您必须生成250,000个随机数,然后按行数对行进行排序。 Two seconds to do this seems pretty fast to me. 两秒钟这样做对我来说似乎相当快。

The only real way to speed up the selection is not have to come up with 250,000 random numbers, but instead lookup rows through an index. 加速选择的唯一真正方法是不必提供250,000个随机数,而是通过索引查找行。

I think you'd have to change the table schema to optimize for this case. 我认为您必须更改表架构以针对此情况进行优化。 How about something like: 怎么样的:

  • 1) Create a new column with a sequence starting at 1. 1)创建一个序列从1开始的新列。
  • 2) Every row will then have a number . 2)每行都有一个number
  • 3) Create an index on: number % 1000 3)创建索引: number % 1000
  • 4) Query for rows where number % 1000 is equal to a random number between 0 and 999 (this should hit the index and load a random portion of your database) 4)查询number % 1000等于0到999之间的随机数的行(这应该命中索引并加载数据库的随机部分)
  • 5) You can probably then add on RANDOM() to your ORDER BY clause and it will then just sort that chunk of your database and be 1,000x faster. 5)然后,您可以将RANDOM()添加到ORDER BY子句中,然后它将对数据库的那个块进行排序,并且速度提高1,000倍。
  • 6) Then select the first two of those rows. 6)然后选择这两行中的前两行。

If this still isn't random enough (since rows will always be paired having the same "hash"), you could probably do a union of two random rows, or have an OR clause in the query and generate two random keys. 如果这仍然不够随机(因为行将始终配对具有相同的“散列”),您可能可以执行两个随机行的并集,或者在查询中使用OR子句并生成两个随机密钥。

Hopefully something along these lines could be very fast and decently random. 希望沿着这些方向的东西可以非常快速和随意地随机。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM