简体   繁体   中英

Aggregating on an array field

I have the following data:

movie (string)         genres (string[])
"titanic"              ["romance", "historical", "drama"]
"spider-man"           ["sci-fi", "action"]
"casablanca"           ["romance", "classic"]

Is there a "standard" way -- at least conceptually speaking -- to aggregate on an array field? For example, doing something like:

SELECT genres, count(*) GROUP BY genres ORDER BY count(*) DESC, genres

If seems to me the result should be something like:

genres         count
"romance"      2
"action"       1
"classic"      1
"drama"        1
"historical"   1
"sci-fi"       1

Is this how most database engines do aggregating on an array field? I'd be interested to see some examples of how aggregations would be done in that engine.

Usually when I've done it I get complains and have to manually make this conversion, for example in pandas:

df=pd.DataFrame({'movie':['titanic', 'spider-man', 'casablanca'], 'genres': [['romance', 'historical', 'drama'], ['sci-fi', 'action'],['romance','classic']]})
df.groupby('genres').first() # will error, or converting to tuple will not unnest the array

Is this how most db engines do aggregating on an array field?

"Most DB engines" do not support arrays to begin with. To my knowledge only Postgres, H2 and HSQLDB fully support arrays.

In the SQL standard you would need to unnest() the array in order to achieve this (the following is Postgres syntax, but I think it's pretty close to the SQL standard):

SELECT ut.genre, count(*) 
FROM the_table
  cross join lateral unnest(genres) as ut(genre) 
GROUP BY ut.genre 
ORDER BY count(*) DESC, ut.genre

The three RDBMS I know that support arrays, would group by the "complete array", not the individual elements. And at least in Postgres, the order of the elements matters, ['romance', 'classic'] is a different array than ['classic', 'romance'] .

So GROUP BY genres would return three distinct rows.

I think this is also what the SQL standard defines, but I am not sure about that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM