简体   繁体   中英

How to aggregate an array of JSON objects with Postgres?

I'm looking to aggregate an array of JSON objects with Postgres, specifically for returning a list of relationships to another table by foreign key. In this case it's a user and their teams .

Here's the schema I'm working with:

CREATE TABLE teams (
  id TEXT PRIMARY KEY,
  ...
);

CREATE TABLE users (
  id TEXT PRIMARY KEY,
  ...
);

CREATE TABLE memberships (
  id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL FOREIGN KEY (user_id) REFERENCES users(id),
  team_id TEXT NOT NULL FOREIGN KEY (team_id) REFERENCES teams(id)
);

With the following query:

  SELECT
    users.id,
    ...
    CASE
      WHEN count(teams.*) = 0
      THEN '[]'::JSON
      ELSE json_agg(DISTINCT teams.id)
    END AS teams
  FROM users
  LEFT JOIN memberships ON users.id = memberships.user_id
  LEFT JOIN teams ON teams.id = memberships.team_id
  WHERE users.id = $[userId]
  GROUP BY
    users.id,
    ...

I can get results as a flat array of team_id s:

{
  id: 'user_1',
  ...
  teams: ['team_1', 'team_2']
}

But I'd like to receive the results as JSON objects instead:

{
  id: 'user_1',
  ...
  teams: [
    { id: 'team_1' },
    { id: 'team_2' }
  ]
}

I get extremely close with:

  SELECT
    users.id,
    ...
    CASE
      WHEN count(teams.*) = 0
      THEN '[]'::JSON
      ELSE json_agg(json_build_object('id', teams.id))
    END AS teams
  FROM users
  LEFT JOIN memberships ON users.id = memberships.user_id
  LEFT JOIN teams ON teams.id = memberships.team_id
  WHERE users.id = $[userId]
  GROUP BY
    users.id,
    ...

But now I've lost the DISTINCT function's de-duping of results, so I end up with duplicate IDs returned for each team .

You can solve this using a sub-query which selects the appropriate combinations, then aggregate into a json array:

SELECT id, json_strip_nulls(json_agg(json_build_object('id', team))) AS teams
FROM (
  SELECT DISTINCT user_id AS id, team_id AS team
  FROM memberships
  WHERE user_id = $[userId]) sub
GROUP BY id;

You can get the user id from and the team id from the memberships table, so no point in joining either table to the memberships table (unless you get other fields from those tables that you haven't shown us). If you do want to use other fields you can paste the JOIN s right back in.

The json_strip_nulls() function will get rid of the [{"id": null}] occurrences and replace them with an empty []::json . This is a PG 9.5 new feature. This also gets rid of the rather ugly and inefficient CASE clause.

It looks to me like this will do it:

SELECT  json_build_object(
          'id',    u.id,
          'teams', array_remove(array_agg(DISTINCT t.*), NULL))
FROM    users u
LEFT OUTER JOIN memberships m
ON      m.user_id = u.id
LEFT OUTER JOIN teams t
ON      m.team_id = t.id
GROUP BY u.id

Works in 9.4. The part about removing NULL s is necessary for users with no team.

I suspect a good general principle when doing JSON in Postgres is to stick with arrays and records as long as possible, and only switch to JSON at the last moment. The more traditional structures have been around longer and are more closely tied to the relational model, so you're less likely to run into problems using them. You can see that this query could have just as easily returned a column named id and an array-valued column named teams .

Note this query gives all users. If you want just one, put that in a WHERE clause.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM