简体   繁体   中英

How I fixed the incompatible error in PostgreSQL while pivoting?

I'm pivoting in PostgreSQL but when I run the query the output says:

 ERROR: return and sql tuple descriptions are incompatible SQL state: 42601

Summarizing, I want the distribution channel on raw, the year in the columns and the operative margin as value.

  • dist_chann_id --> integer
  • year --> year
  • operative_margin --> integer

Without the pivot the output is:

dist_chann_name year operative_margin
1 2020 20783
1 2021 5791
2 2020 30362
3 2021 14501
3 2020 2765
3 2021 4535

This is my query:

SELECT *
FROM   crosstab(
'SELECT dist_chann_id, year, operative_margin
FROM marginality_by_channel
ORDER BY dist_chann_id, year'
     ) AS ct ("DC" int, "2020" int, "2021" int);

Using filtered aggregation is typically much easier than the somewhat convoluted crosstab() function (at least in my opinion).

select dist_chann_name as dc,
       sum(operative_margin) filter (where year = 2020) as "2020",
       sum(operative_margin) filter (where year = 2021) as "2021"
from marginality_by_channel
group by dist_chann_name
order by dist_chann_name;

Source of the error msg

One of the columns does not have data type you think it has. Must be operative_margin , probably text ?

The 1-parameter form of crosstab() only uses the "category" column ( year in your example) only for sorting. And the "row_name" column ( dist_chann_name - or dist_chann_id ?) would produce a different error msg.

Solution

Either way, unless you can guarantee that every "row_name" has exactly two values to it, it's safer to use the 2-parameter form of corosstab() :

SELECT *
FROM  crosstab(
   $$
   SELECT dist_chann_name, year, operative_margin
   FROM   marginality_by_channel
   ORDER  BY 1, 2
   $$
 , 'VALUES (2020), (2021)'
   ) AS ct ("DC" int, "2020" int, "2021" int);

db<>fiddle here

This variant also happens to be more tolerant with type mismatches (as everything is passed as text anyway). See:

crosstab() shines for many resulting value columns (faster, shorter). For just two "value" columns, aggregate FILTER might be the better (simpler) choice. Not much performance to gain (if any, after adding some overhead). See:

Broken setup

That aside, your setup is ambiguous to begin with. It includes two rows for the same (dist_chann_name, year) = (3, 2021) .

  • a_horse uses sum() in his aggregate FILTER solution. You might also use min() or max() , or whatever ...
  • My solution with the 2-parameter form outputs the last value according to sort order. (Think of it as each next value overwriting it's dedicated spot.)
  • The 1-parameter form outputs the first value according to sort order. (Think of it as "first come, first serve". Superfluous rows are discarded.)

A clean solution would use an explicit sort order and document the effect, or work with a query producing distinct values, or use the appropriate aggregate function with the FILTER solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM