简体   繁体   中英

Select rows from database where the value in one column is distinct and limit to 5 latest

I have a database of images, the image rows are updated with the last IP of who viewed them and updates a date_updated column with the current timestamp. I am trying to get the last 5 images viewed but only each distinct ip address, I don't want one person to flood the last viewed result.

Fiddle: : http://sqlfiddle.com/#!2/d5b05/16

Desired result: The desired result while selecting from this data set:

SELECT * FROM `image` ORDER BY `date_updated` DESC;

|   IMAGE | WIDTH | HEIGHT | DATE_ADDED | DATE_UPDATED | UPDATED_BY_IP |
|---------|-------|--------|------------|--------------|---------------|
| 1x1XGY4 |  1920 |   1080 | 1417546414 |   1421712314 |   192.168.0.7 |
| 1x1XGY3 |  1920 |   1080 | 1417546413 |   1421712313 |   192.168.0.7 |
| 1x1XGY2 |  1920 |   1080 | 1417546412 |   1421712312 |  192.168.0.10 |
| 1x1XGY1 |  1920 |   1080 | 1417546411 |   1421712311 |  192.168.0.10 |
| 1oApS54 |  1920 |   1080 | 1417138874 |   1421685474 |   192.168.0.2 |
| 1oApS53 |  1920 |   1080 | 1417138873 |   1421685473 |   192.168.0.2 |
| 1oApS52 |  1920 |   1080 | 1417138872 |   1421685472 |  192.168.0.10 |
| 1oApS51 |  1920 |   1080 | 1417138871 |   1421685471 |  192.168.0.10 |
| 1ydhtQ4 |  1920 |   1080 | 1421460434 |   1421685154 |   192.168.0.6 |
| 1ydhtQ3 |  1920 |   1080 | 1421460433 |   1421685153 |   192.168.0.7 |
| 1ydhtQ2 |  1920 |   1080 | 1421460432 |   1421685152 |  192.168.0.10 |
| 1ydhtQ1 |  1920 |   1080 | 1421460431 |   1421685151 |   192.168.0.5 |
| 1WyQib4 |  1920 |   1080 | 1420869354 |   1421634384 |   192.168.0.8 |
| 1WyQib3 |  1920 |   1080 | 1420869353 |   1421634383 |   192.168.0.2 |
| 1WyQib2 |  1920 |   1080 | 1420869352 |   1421634382 |   192.168.0.3 |
| 1WyQib1 |  1920 |   1080 | 1420869351 |   1421634381 |  192.168.0.10 |
| 1izDqg4 |  1920 |   1080 | 1416948144 |   1421608564 |   192.168.0.2 |
| 1izDqg3 |  1920 |   1080 | 1416948143 |   1421608563 |   192.168.0.2 |
| 1izDqg2 |  1920 |   1080 | 1416948142 |   1421608562 |   192.168.0.5 |
| 1izDqg1 |  1920 |   1080 | 1416948141 |   1421608561 |  192.168.0.10 |

With pseudo select statement:

SELECT * FROM image WHERE updated_by_ip IS DISTINCT ORDER BY date_updated DESC LIMIT 5

|   IMAGE | WIDTH | HEIGHT | DATE_ADDED | DATE_UPDATED | UPDATED_BY_IP |
|---------|-------|--------|------------|--------------|---------------|
| 1x1XGY4 |  1920 |   1080 | 1417546414 |   1421712314 |   192.168.0.7 |
| 1x1XGY2 |  1920 |   1080 | 1417546412 |   1421712312 |  192.168.0.10 |
| 1oApS54 |  1920 |   1080 | 1417138874 |   1421685474 |   192.168.0.2 |
| 1ydhtQ4 |  1920 |   1080 | 1421460434 |   1421685154 |   192.168.0.6 |
| 1ydhtQ1 |  1920 |   1080 | 1421460431 |   1421685151 |   192.168.0.5 |

Closet result:

The best I could come up with is:

SELECT DISTINCT updated_by_ip, MAX(date_updated) AS date_updated 
FROM `image` GROUP BY updated_by_ip ORDER BY date_updated DESC LIMIT 5;

This gives me:

| UPDATED_BY_IP | DATE_UPDATED |
|---------------|--------------|
|   192.168.0.7 |   1421712314 |
|  192.168.0.10 |   1421712312 |
|   192.168.0.2 |   1421685474 |
|   192.168.0.6 |   1421685154 |
|   192.168.0.5 |   1421685151 |

Of which I could do a

while (SELECT DISTINCT updated_by_ip ...)
{
    $result_rows[] = SELECT * FROM image 
                    WHERE updated_by_ip = query[updated_by_ip] 
                    AND date_updated = query[date_updated] LIMIT 1
}

However, was hoping to find a way to do this without having to do a bunch of post processing and additional queries, as well, selecting by updated_by_ip and date_updated doesn't seem very stable.

Thank you.

It's not the prettiest query (incorrect according to the SQL standard) but it works in MySQL:

SELECT * FROM `image`
GROUP BY updated_by_ip
ORDER BY `date_updated` DESC

In Postgres you would use DISTINCT ON(...) but MySQL doesn't support that so just grouping by the columns you want to have distinct is the easiest workaround. The alternative is using subqueries but that performs quite a bit less optimal.

One approach is to use variables to enumerate the rows:

SELECT i.*
FROM (SELECT i.*,
             (@rn := if(@uip = updated_by_ip, @rn + 1,
                        if(@uip := updated_by_ip, 1, 1)
                       )
             )
      FROM image i CROSS JOIN
           (SELECT @uip := '', @rn := 0) vars
      WHERE updated_by_ip 
      ORDER BY updated_by_ip, date_updated DESC
     ) i
WHERE seqnum <= 5;

To do this without the MySQL GROUP BY extension, you can try this:

First, obtain the most recent update time from five distinct IP numbers, with this subquery.

     SELECT updated_by_ip, MAX(date_updated) as date_updated
       FROM image  
      GROUP BY updated_by_ip
      ORDER BY 2 DESC
      LIMIT 5

If your table is big, an index on (updated_by_ip, date_updated) will help performance.

Then, join that to the main query to that subquery to get your result.

SELECT i.*
  FROM image i
  JOIN (
         SELECT updated_by_ip, MAX(date_updated) as date_updated
           FROM image  
          GROUP BY updated_by_ip
          ORDER BY 2 DESC
          LIMIT 5
        ) m USING(updated_by_ip, date_updated)
ORDER BY i.date_updated DESC

See this: http://sqlfiddle.com/#!2/d5b05/21/0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM