简体   繁体   English

如何优化PostgreSQL generate_series函数

[英]How To Optimize PostgreSQL generate_series function

I have a query that uses PostgreSQL generate_series function but when it comes to large amounts of data, the query can be slow. 我有一个使用PostgreSQL generate_series函数的查询,但是当涉及到大量数据时,查询可能会很慢。 An example of code the generates the query is below: 生成查询的代码示例如下:

$yesterday = date('Y-m-d',(strtotime ( '-1 day' ) ));


            $query = "
            WITH interval_step AS (
            SELECT gs::date AS interval_dt, random() AS r 
            FROM generate_series('$yesterday'::timestamp, '2015-01-01', '1 day') AS gs)
            SELECT articles.article_id, article_title, article_excerpt, article_author, article_link, article_default_image, article_date_published, article_bias_avg, article_rating_avg
            FROM development.articles JOIN interval_step ON articles.article_date_added::date=interval_step.interval_dt ";

            if (isset($this -> registry -> get['category'])) {
                $query .= "
                JOIN development.feed_articles ON articles.article_id = feed_articles.article_id
                JOIN development.rss_feeds ON feed_articles.rss_feed_id = rss_feeds.rss_feed_id
                JOIN development.news_categories ON rss_feeds.news_category_id = news_categories.news_category_id
                WHERE news_category_name = $1";

                $params = array($category_name);
                $query_name = 'browse_category';
            }

            $query .= " ORDER BY interval_step.interval_dt DESC, RANDOM() LIMIT 20;";

This series looks for only content that goes one day back and sorts the results in random order. 此系列仅查找前一天的内容,并按随机顺序对结果进行排序。 My question is what are was that generate_series can be optimized to improve performance? 我的问题是generate_series可以被优化以提高性能是什么?

Imho, try removing that random() in your order by statement. Imho,尝试order by删除你的order by中的random() It probably has a much larger performance impact than you think. 它可能比您想象的更大的性能影响。 As things are it's probably ordering the entire set by interval_dt desc, random() , and then picking the top 20. Not advisable... 事情是它可能通过interval_dt desc, random()排序整个集合,然后选择前20个。不可取......

Try fetching eg 100 rows ordered by interval_dt desc instead, then shuffle them per the same logic, and pick 20 in your app. 尝试获取由interval_dt desc排序的100行,然后根据相同的逻辑对它们进行洗牌,并在您的应用中选择20。 Or wrap the entire thing in a subquery limit 100 , and re-order accordingly along the same lines. 或者将整个事物包装在子查询limit 100 ,并相应地沿相同的行重新排序。

You don't need that generate_series at all. 您根本不需要generate_series And do not concatenate query strings. 并且不要连接查询字符串。 Avoid it by making the parameter an empty string (or null) if it is not set: 如果未设置参数,则将参数设置为空字符串(或null)来避免它:

if (!isset($this -> registry -> get['category']))
    $category_name = '';

$query = "
    select articles.article_id, article_title, article_excerpt, article_author, article_link, article_default_image, article_date_published, article_bias_avg, article_rating_avg
    from
        development.articles
        inner join
        development.feed_articles using (article_id)
        inner join
        development.rss_feeds using (rss_feed_id)
        inner join
        development.news_categories using (news_category_id)
    where
        (news_category_name = $1 or $1 = '')
        and articles.article_date_added >= current_date - 1
    order by
        date_trunc('day', articles.article_date_added) desc,
        random()
    limit 20;
";

$params = array($category_name);

Passing $yesterday to the query is also not necessary as it can be done entirely in SQL. $yesterday传递给查询也没有必要,因为它可以完全在SQL中完成。

If $category_name is empty it will return all categories: 如果$category_name为空,则返回所有类别:

(news_category_name = $1 or $1 = '')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM