简体   繁体   中英

SQL query keep showing duplicates

I have a table with countries, regions, and different measurements. I'm trying to sum a specific column for all equal regions, and then show them for each country. The problem is, countries appear more than once in my table because another column is a "consumption category".

Right now my query is:

SELECT main.country, main.region, (SELECT SUM(sec.share) 
FROM data_xlsx_Hoja2 sec 
WHERE sec.region = main.region AND sec.segment="lowest" AND sec.category="food") as total 
FROM data_xlsx_Hoja2 main

The result is like this:

+ --------+-------------------------+-------------------+
| country | region                  | total             |
+ --------+-------------------------+-------------------+
| Albania | Europe and Central Asia | 8.152791917324066 |
| Albania | Europe and Central Asia | 8.152791917324066 |
| Albania | Europe and Central Asia | 8.152791917324066 |
| Albania | Europe and Central Asia | 8.152791917324066 |
| Albania | Europe and Central Asia | 8.152791917324066 |
| Albania | Europe and Central Asia | 8.152791917324066 |
| Albania | Europe and Central Asia | 8.152791917324066 |
| Albania | Europe and Central Asia | 8.152791917324066 |
| Albania | Europe and Central Asia | 8.152791917324066 |
+ --------+-------------------------+-------------------+

I need my query to show each country just one, but with the number that already shows the query... I tried to use GROUP BY and SELECT DISTINCT but the query just keeps loading and never shows the result. The table has around 30000 rows.

My first observation is: Do you really want the share by region or by country? That makes more sense, and would look something like this:

SELECT h.region, h.country, SUM(h.share) 
FROM data_xlsx_Hoja2 h 
WHERE h.segment = 'lowest' AND h.category = 'food'
GROUP BY h.region, h.country;

If you want the region sum per country, then you need to get a list of countries. Something like this:

SELECT rc.*, r.region_share
FROM (SELECT DISTINCT h.region, h.country
      FROM data_xlsx_Hoja2 h 
     ) rc LEFT JOIN
     (SELECT h.region, SUM(h.share) as region_share
      FROM data_xlsx_Hoja2 h 
      WHERE h.segment = 'lowest' AND h.category = 'food'
      GROUP BY h.region
     ) r
     ON rc.region = r.region;

try this, as i understood your question you need this

SELECT country, region, SUM(share) as total 
FROM data_xlsx_Hoja2 sec WHERE sec.segment="lowest" AND sec.category="food"
group by country, region

please clearify your question so we can give proper ans

One would expect a country table and a region table. As is, we must create a country table from your data table first:

select distinct country from data_xlsx_hoja2;

Then you want share sums per region:

select region, sum(share) from data_xlsx_hoja2 group by region;

Now you want to join countries to their region, but oops ... in your datamodel a country can belong to different regions, as there is no country table with one record per country, each with a region ID. The same country can appear in data_xlsx_hoja2 with different regions. Well, it can even appear multifold ('Albania', 'ALBANIA', 'Republic of Albania', ...) It's time you normalize your database.

What we can do is trick ourselves through your table, generating a country table with regions from it:

select country, any_value(region) from data_xlsx_hoja2 group by country;

The complete query:

select c.country, r.total_share
from (select country, min(region) as region from data_xlsx_hoja2 group by country) c
join (select region, sum(share) as total_share from data_xlsx_hoja2 group by region) r
  using (region)
order by c.country;

Place your conditions ( segment = 'lowest' AND category = 'food' ) where appropriate. Do you only want to show countries that have matching records? Or do you merely want to exclude these records from the region sums?

Anyway, you should really fix your data model:

  • table region (region_id, region_name)
  • table country (country_id, country_name, region_id)
  • table data_xlsx_hoja2(data_xlsx_hoja2_id, country_id, share)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM