简体   繁体   中英

how to use group by in postgresql

The objective is to make a query using two different tables; country and city. Country contains name (of Country), and country_code (primary key), and city contains name (of city), population, and country_code (primary key). I want to use the aggregate function GROUP BY, but the query I have below doesn't work.

For each country, list the largest population of any of its cities and the name of that city. So I need to list the cities with the largest population of each country.

So what should be displayed is the Country, the City (with the largest population), then the population of that city. There should only be one country per city.

$query6 = "SELECT c.name AS country, ci.name AS city,
GREATEST(ci.population) AS max_pop
FROM lab6.country c INNER JOIN lab6.city ci
ON(c.country_code = ci.country_code)
GROUP BY c.name
ORDER BY country ASC";

I have also tried GROUP BY country, DISTINCT c.name.

I am new to aggregate functions, so if there are specific situations you are suppose to use GROUP BY and this is not one of them please let me know.

I am using PHP to run the query like so:

$result = pg_query($connection, $query);
if(!$result)
{
       die("Failed to connect to database");
}

ERROR: column "ci.name" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT DISTINCT c.name AS country, ci.name AS city, is the error.

The tables are given to us, we don't make them, and I cant include a screen shot of the made tables because I don't have any reputation.

Some DDL to play with.

create table country (
  country_code char(2) primary key, -- ISO country code
  country_name varchar(35) not null unique
);

insert into country values 
('US', 'United States of America'),
('IT', 'Italy'),
('IN', 'India');

-- The full name of a city is more than city name plus country name.
-- In the US, there are a couple of dozen cities named Springfield,
-- each in a different state. I'd be surprised if this weren't true
-- in most countries.
create table city (
  country_code char(2) not null references country (country_code),
  name varchar(35) not null,
  population integer not null check (population > 0),
  primary key (country_code, name)
);

insert into city values 
('US', 'Rome, GA', 36303),
('US', 'Washington, DC', 632323),
('US', 'Springfield, VA', 30484),
('IT', 'Rome', 277979),
('IT', 'Milan', 1324110),
('IT', 'Bari', 320475),
('IN', 'Mumbai', 12478447),
('IN', 'Patna', 1683200),
('IN', 'Cuttack', 606007);

Largest population in a country.

select country.country_code, max(city.population) as max_population
from country
inner join city on country.country_code = city.country_code
group by country.country_code;

There are several ways to use that in order to get the result you want. One way is to use an inner join on a common table expression.

with max_population as (
  select country.country_code, max(city.population) as max_population
  from country
  inner join city on country.country_code = city.country_code
  group by country.country_code
)
select city.country_code, city.name, city.population
from city
inner join max_population 
        on max_population.country_code = city.country_code
       and max_population.max_population = city.population;

Another way is to use an inner join on a subquery. (The text of the common table expression goes "into" the main query. Using the alias "max_population", the query requires no further changes to work.)

select city.country_code, city.name, city.population
from city
inner join (select country.country_code, max(city.population) as max_population
            from country
            inner join city on country.country_code = city.country_code
            group by country.country_code
           ) max_population 
        on max_population.country_code = city.country_code
       and max_population.max_population = city.population;

Yet another way is to use a windowing function in a subquery. You need to select from the subquery, because you can't directly use the result of rank() in a WHERE clause. That is, this works.

select country_code, name, population
from (select country_code, name, population,
      rank() over (partition by country_code 
                   order by population desc) as city_population_rank
      from city
     ) city_population_rankings
where city_population_rank = 1;

But this doesn't, even though it makes more sense at first glance.

select country_code, name, population,
       rank() over (partition by country_code 
                    order by population desc) as city_population_rank
from city
where city_population_rank = 1;

ERROR:  column "city_population_rank" does not exist

The best way to do this is recent versions of PostgreSQL is with windowing. ( Docs .) Before it was necessary to do ugly things when you wanted to carry into the final output some other columns of a special row, eg, the row with the maximum population.

WITH preliminary AS 
     (SELECT country_code, city_name, population,
      rank() OVER (PARTITION BY country_code ORDER BY population DESC) AS r
      FROM country
      NATURAL JOIN city) -- NATURAL JOIN collapses 2 country_code columns into 1
SELECT * FROM preliminary WHERE r=1;

This also does something intelligent in the admittedly unlikely case that two or more largest cities in a country have the exact same population.

[Edit in response to comment]

Before windowing, my usual approach was

SELECT country_code, city_name, population
FROM country co1 NATURAL JOIN city ci1
WHERE ROW(co1.country_code, ci1.population) =
    (SELECT co2.country_code, ci2.population 
     FROM country co2 NATURAL JOIN city ci2
     WHERE co1.country_code = co2.country_code 
     ORDER BY population DESC LIMIT 1) 
     AS subquery;
-- note for lurkers, some other DBs use TOP 1 instead of LIMIT

The performance of this is not-too-bad since if the DB is indexed intelligently Postgres optimizes the subquery. Compare this to the inner join on a subquery approach of Mike Sherrill's answer.

Favor us with the instructor's answer, would you? With the equipment you have so far, it will likely be inefficient, incomplete in case of ties, or both.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM