简体   繁体   中英

How to remove duplicates from a SQL query based on a single column

There's a query I need to modify. What the query currently does is return search results (ads) based on Ad Title and Ad Description . If any of the search words are either found in ad title or ad description, it returns those results

I want to modify the query so that each ad appears in search results only once for a given ad title... So if there were 5 ads found with the same ad title for the given words in the search , it should return only 1 ad for that ad title...

$sql = "SELECT a.*, UNIX_TIMESTAMP(a.createdon) AS timestamp, ct.cityname,
                    COUNT(*) AS piccount, p.picfile,
                    scat.subcatname, cat.catid, cat.catname $xfieldsql
                FROM t_ads a
                    INNER JOIN t_cities ct ON a.cityid = ct.cityid
                    INNER JOIN t_subcats scat ON a.subcatid = scat.subcatid
                    INNER JOIN t_cats cat ON scat.catid = cat.catid
                    LEFT OUTER JOIN t_adxfields axf ON a.adid = axf.adid
                    LEFT OUTER JOIN t_adpics p ON a.adid = p.adid AND p.isevent = '0'
                    LEFT OUTER JOIN t_featured feat ON a.adid = feat.adid AND feat.adtype = 'A'
                WHERE $where
                    AND $visibility_condn
                    AND (feat.adid IS NULL OR feat.featuredtill < NOW())
                    $loc_condn
                GROUP BY a.adid
                ORDER BY a.createdon DESC
                LIMIT $offset, $ads_per_page";

Edit: $where contains the search expression... if regular expression search is turned on it uses regex otherwise not... $sqlsearch contains the search words that were input by the user...

if ($regex_search) {
                $where = "(a.adtitle RLIKE '[[:<:]]{$searchsql}[[:>:]]' OR a.addesc RLIKE '[[:<:]]{$searchsql}[[:>:]]')";
            } else {
                $where = "(a.adtitle LIKE '$searchsql' OR a.addesc LIKE '$searchsql')";

The "proper" way to do this would be tackle the route cause by working out why the duplicates are appearing in the first place. It will be something to do with the JOINs but without looking at the data I'm unable to answer that. If, however you'd like a quick(ish) and dirty way to remove duplicates, could try something like below.

Disclaimer: This is completely untested so there's more likely to be a mistake or two in here - but hopefully no dealbreaker.

SELECT a2.*, UNIX_TIMESTAMP(a.createdon) AS timestamp, ct2.cityname,
       COUNT(*) AS piccount, p2.picfile,
       scat2.subcatname, cat2.catid, cat2.catname $xfieldsql
FROM
   (SELECT subq1.title, MIN(subq1.adid) AS adid
    FROM 
           (SELECT a.*, UNIX_TIMESTAMP(a.createdon) AS timestamp, ct.cityname,
                COUNT(*) AS piccount, p.picfile,
                scat.subcatname, cat.catid, cat.catname
            FROM t_ads a
                INNER JOIN t_cities ct ON a.cityid = ct.cityid
                INNER JOIN t_subcats scat ON a.subcatid = scat.subcatid
                INNER JOIN t_cats cat ON scat.catid = cat.catid
                LEFT OUTER JOIN t_adxfields axf ON a.adid = axf.adid
                LEFT OUTER JOIN t_adpics p ON a.adid = p.adid AND p.isevent = '0'
                LEFT OUTER JOIN t_featured feat ON a.adid = feat.adid AND feat.adtype = 'A'
            WHERE $where
                AND $visibility_condn
                AND (feat.adid IS NULL OR feat.featuredtill < NOW())
                $loc_condn
            GROUP BY a.adid) subq1
    GROUP BY subq.title) subq2
INNER JOIN t_ads a2 ON a2.adid = subq2.adid
INNER JOIN t_cities ct2 ON a2.cityid = ct2.cityid
INNER JOIN t_subcats scat2 ON a2.subcatid = scat2.subcatid
INNER JOIN t_cats cat2 ON scat2.catid = cat2.catid
LEFT OUTER JOIN t_adxfields axf2 ON a2.adid = axf2.adid
LEFT OUTER JOIN t_adpics p2 ON a2.adid = p2.adid AND p2.isevent = '0'
LEFT OUTER JOIN t_featured feat2 ON a2.adid = feat2.adid AND feat2.adtype = 'A'
ORDER BY a2.createdon DESC
LIMIT $offset, $ads_per_page

This could be massively simplified and tidied up eg by removing some of the stuff from the subquery but am just giving the general idea to (hopefully) get you up and running...

Explanation

subq2 simply groups by title and picks out an adid from each group (chose to use MIN here but could have used MAX instead).

subq1 is the original query but with ordering and limits removed since these are applied by the outer query.

The outer query joins back on the de-duped IDs and joins back to the ads and other tables (giving them different aliases) in order to select the fields from your original query.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM