I have a query that uses PostgreSQL generate_series function but when it comes to large amounts of data, the query can be slow. An example of code the generates the query is below:
$yesterday = date('Y-m-d',(strtotime ( '-1 day' ) ));
$query = "
WITH interval_step AS (
SELECT gs::date AS interval_dt, random() AS r
FROM generate_series('$yesterday'::timestamp, '2015-01-01', '1 day') AS gs)
SELECT articles.article_id, article_title, article_excerpt, article_author, article_link, article_default_image, article_date_published, article_bias_avg, article_rating_avg
FROM development.articles JOIN interval_step ON articles.article_date_added::date=interval_step.interval_dt ";
if (isset($this -> registry -> get['category'])) {
$query .= "
JOIN development.feed_articles ON articles.article_id = feed_articles.article_id
JOIN development.rss_feeds ON feed_articles.rss_feed_id = rss_feeds.rss_feed_id
JOIN development.news_categories ON rss_feeds.news_category_id = news_categories.news_category_id
WHERE news_category_name = $1";
$params = array($category_name);
$query_name = 'browse_category';
}
$query .= " ORDER BY interval_step.interval_dt DESC, RANDOM() LIMIT 20;";
This series looks for only content that goes one day back and sorts the results in random order. My question is what are was that generate_series can be optimized to improve performance?
Imho, try removing that random()
in your order by
statement. It probably has a much larger performance impact than you think. As things are it's probably ordering the entire set by interval_dt desc, random()
, and then picking the top 20. Not advisable...
Try fetching eg 100 rows ordered by interval_dt desc
instead, then shuffle them per the same logic, and pick 20 in your app. Or wrap the entire thing in a subquery limit 100
, and re-order accordingly along the same lines.
You don't need that generate_series
at all. And do not concatenate query strings. Avoid it by making the parameter an empty string (or null) if it is not set:
if (!isset($this -> registry -> get['category']))
$category_name = '';
$query = "
select articles.article_id, article_title, article_excerpt, article_author, article_link, article_default_image, article_date_published, article_bias_avg, article_rating_avg
from
development.articles
inner join
development.feed_articles using (article_id)
inner join
development.rss_feeds using (rss_feed_id)
inner join
development.news_categories using (news_category_id)
where
(news_category_name = $1 or $1 = '')
and articles.article_date_added >= current_date - 1
order by
date_trunc('day', articles.article_date_added) desc,
random()
limit 20;
";
$params = array($category_name);
Passing $yesterday
to the query is also not necessary as it can be done entirely in SQL.
If $category_name
is empty it will return all categories:
(news_category_name = $1 or $1 = '')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.