简体   繁体   中英

Calling two mysql tables with DISTINCT

this is my ongoing internal/learning/base for other things project.

I have a mysql db with multiple tables but mainly I'm using two of them; links_sp and link_fields_sp.

I'm searching links with php script that looks like this.

$query = "SELECT * FROM links_sp WHERE status='active' AND type='shopping' AND language='en' AND MATCH(link_url,link_title,link_keywords,link_details) AGAINST (? IN BOOLEAN MODE) $order_clause";

Search looks like this:

  1. ebay.com, laptop, sony, 15 inch

  2. ebay.com, laptop, toshiba, 15 inch

  3. bestbuy.com, laptop, sony, 15 inch

  4. ebay.com, laptop, dell, 15 inch

  5. amazon.com, laptop, sony, 15 inch

etc etc ..

Works perfectly and simple, but what I would like to have is to use DISTINCT with url host so it shows only one result from one host:

  1. ebay.com, laptop, sony, 15 inch
  2. bestbuy.com, laptop, sony, 15 inch
  3. amazon.com, laptop, sony, 15 inch and so on..

I'm storing urls in links_sp table under link_url column and url host for each link in link_fields_sp table under url_host column. Also each link has link_id column and respectively same id in other table.

I tried:

$query = "SELECT DISTINCT * FROM (SELECT * FROM links_sp WHERE status='active' AND type='shopping' AND language='en' AND MATCH(link_url,link_title,link_keywords,link_details) UNION ALL SELECT * FROM link_fields_sp MATCH(url_host)) AGAINST (? IN BOOLEAN MODE) $order_clause";

but i'm doing something wrong.

Appreciate any input. Thank You.

"SELECT DISTINCT" operates across the entire row, just a tiny difference in ANY column position can cause the row to be considered different to every other row. These 3 rows are "distinct":

ebay.com, laptop, sony, 15 inch
ebay.com, laptop, toshiba, 15 inch
ebay.com, laptop, dell, 15 inch

NB. The use of * in select distinct * makes it less likely that the number of rows will be reduced because for every added column there is more liklihood of differences being introduced. By contrast the following will reduuce the number of rows:

Select distinct domain from table

GROUP BY is the alternative and using this opens up many many more options. However even with this you need to consider how you will summarize the data.

Originally MySQL defaulted to a non-standard approach to the group by syntax where it is permitted to have many columns in the select clause but fewer listed in the group by clause (eg 20 columns in select clause but just 1 in the group by clause). eg

## non-standard sql syntax in MySQL
select   domain, type, supplier, size from table
group by domain

But consider this carefully: What does this query do as soon as there are multiple values in type, supplier, size ? The results in those columns are "indeterminate" (there is no definitive logic in wht you see in those columns) if you use this non-standard approach.

By contrast the SQL standard requires that every "non-aggregating" column of the select list is also listed in the group by clause. eg

## standard sql syntax for group by clause
select   domain, type, supplier, size from table
group by domain, type, supplier, size

(that example produces the same results as select distinct would)

So, to use group by to achieve a smaller number of rows, but with some "logic" applied to the other columns you could do something like this:

## standard sql syntax
select   domain, max(type), max(supplier), max(size) from table
group by domain

However while this reduces the rows, the other columns are somewhat arbitrary and the vales meeting the MAX() condition might not come from the same rows. This is quite a challenge then because it is easy to reduce the rows, but making the other columns useful at the same time isn't.

In MySQL a common technique to overcome this difficulty is to use GROUP_CONCAT or a combination of CONCAT() with GROUP_GONCAT() see queries 6 & 7 below

SQL Fiddle

Setup :

CREATE TABLE Table1
    (`domain` varchar(11), `type` varchar(6), `supplier` varchar(7), `size` varchar(7))
;

INSERT INTO Table1
    (`domain`, `type`, `supplier`, `size`)
VALUES
    ('ebay.com', 'laptop', 'sony', '15 inch'),
    ('ebay.com', 'laptop', 'toshiba', '15 inch'),
    ('bestbuy.com', 'laptop', 'sony', '15 inch'),
    ('ebay.com', 'laptop', 'dell', '15 inch'),
    ('amazon.com', 'laptop', 'sony', '15 inch')
;

Query 1 :

select distinct * from table1

Results :

|      domain |   type | supplier |    size |
|-------------|--------|----------|---------|
|    ebay.com | laptop |     sony | 15 inch |
|    ebay.com | laptop |  toshiba | 15 inch |
| bestbuy.com | laptop |     sony | 15 inch |
|    ebay.com | laptop |     dell | 15 inch |
|  amazon.com | laptop |     sony | 15 inch |

Query 2 :

select distinct domain from table1

Results :

|      domain |
|-------------|
|    ebay.com |
| bestbuy.com |
|  amazon.com |

Query 3 :

## non-standard sql syntax in MySQL
select   domain, type, supplier, size from table1
group by domain

Results

|      domain |   type | supplier |    size |
|-------------|--------|----------|---------|
|  amazon.com | laptop |     sony | 15 inch |
| bestbuy.com | laptop |     sony | 15 inch |
|    ebay.com | laptop |     sony | 15 inch |

Query 4 :

## standard sql syntax for group by clause
select   domain, type, supplier, size from table1
group by domain, type, supplier, size

Results :

|      domain |   type | supplier |    size |
|-------------|--------|----------|---------|
|  amazon.com | laptop |     sony | 15 inch |
| bestbuy.com | laptop |     sony | 15 inch |
|    ebay.com | laptop |     dell | 15 inch |
|    ebay.com | laptop |     sony | 15 inch |
|    ebay.com | laptop |  toshiba | 15 inch |

Query 5 :

## standard sql syntax
select   domain, max(type), max(supplier), max(size) from table1
group by domain

Results :

|      domain | max(type) | max(supplier) | max(size) |
|-------------|-----------|---------------|-----------|
|  amazon.com |    laptop |          sony |   15 inch |
| bestbuy.com |    laptop |          sony |   15 inch |
|    ebay.com |    laptop |       toshiba |   15 inch |

Query 6 :

## use group_concat
select   domain, group_concat(type), group_concat(supplier), group_concat(size) from table1
group by domain

Results :

|      domain |   group_concat(type) | group_concat(supplier) |      group_concat(size) |
|-------------|----------------------|------------------------|-------------------------|
|  amazon.com |               laptop |                   sony |                 15 inch |
| bestbuy.com |               laptop |                   sony |                 15 inch |
|    ebay.com | laptop,laptop,laptop |      sony,toshiba,dell | 15 inch,15 inch,15 inch |

Query 7 :

## use both concat and group_concat
select
       domain
     , group_concat(concat('[',type,',',supplier,',',size,']')) type_supplier_size
from table1
group by domain

Results :

|      domain |                                                   type_supplier_size |
|-------------|----------------------------------------------------------------------|
|  amazon.com |                                                [laptop,sony,15 inch] |
| bestbuy.com |                                                [laptop,sony,15 inch] |
|    ebay.com | [laptop,sony,15 inch],[laptop,toshiba,15 inch],[laptop,dell,15 inch] |

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM