Tricky PostgreSQL join and order query

Question

I've got four tables in a PostgreSQL 9.3.6 database:

sections
fields (child of sections)
entries (child of sections)
data (child of entries)

CREATE TABLE section (
  id serial PRIMARY KEY,
  title text,
  "group" integer
);

CREATE TABLE fields (
  id serial PRIMARY KEY,
  title text,
  section integer,
  type text,
  "default" json
);

CREATE TABLE entries (
  id serial PRIMARY KEY,
  section integer
);

CREATE TABLE data (
  id serial PRIMARY KEY,
  data json,
  field integer,
  entry integer
);

I'm trying to generate a page that looks like this:

section title

          field 1 title  | field 2 title | field 3 title 
entry 1 | data 'as' json | data 1 json   | data 3 json        <-- table
entry 2 | data 'df' json | data 5 json   | data 6 json  
entry 3 | data 'gh' json | data 8 json   | data 9 json

The way I have it set up right now each piece of 'data' has an entry it's linked to, a corresponding field (that field has columns that determine how the data's json field should be interpreted), a json field to store different types of data, and an id (1-9 here in the table).

In this example there are 3 entries, and 3 fields and there is a data piece for each of the cells in between.

It's set up like this because one section can have different field types and quantity than another section and therefore different quantities and types of data.

Challenge 1:

I'm trying to join the table together in a way that it's sortable by any of the columns (contents of the data for that field's json column). For example I want to be able to sort field 3 (the third column) in reverse order, the table would look like this:

section title

          field 1 title  | field 2 title | field 3 title 
entry 3 | data 'gh' json | data 8 json   | data 9 json  
entry 2 | data 'df' json | data 5 json   | data 6 json  
entry 1 | data 'as' json | data 1 json   | data 3 json        <-- table

I'm open to doing it another way too if there's a better one.

Challenge 2:

Each field has a 'default value' column - Ideally I only have to create 'data' entries when they have a value that isn't that default value. So the table might actually look like this if field 2's default value was 'asdf':

section title

          field 1 title  | field 2 title | field 3 title 
entry 3 | data 'gh' json | data 8 json   | data 9 json  
entry 2 | data 'df' json | 'asdf'        | data 6 json  
entry 1 | data 'as' json | 'asdf'        | data 3 json        <-- table

Answer 1

The key to writing this query is understanding that you just need to fetch all the data for single section and the rest you just join. You also can't with your schema directly filter data by section so you'll need to join entry just for that:

SELECT d.* FROM data d JOIN entries e ON (d.entry = e.id) 
WHERE e.section = ?

You can then join field to each row to get defaults, types and titles:

SELECT d.*, f.title, f.type, f."default" 
FROM data d JOIN entries e ON (d.entry = e.id) 
            JOIN fields f ON (d.field = f.id)
WHERE e.section = ?

Or you can select fields in a separate query to save some network traffic.

So this was an answer, here come bonuses:

Use foreign keys instead of integers to refer to other tables, it will make database check consistency for you.
Relations (tables) should be called in singular by convention, so it's section , entry and field .
Referring fields are called <name>_id , eg field_id or section_id also by convention.
The whole point of JSON fields is to store a collection with not statically defined data, so it would made much more sense to not use entries and data tables, but single table with JSON containing all the fields instead.

Like this:

CREATE TABLE row ( -- less generic name would be even better
    id int primary key,
    section_id int references section (id),
    data json
)

With data fields containing something like:

{
    "title": "iPhone 6",
    "price": 650,
    "available": true,
    ...
}

Answer 2

@Suor has provided good advice, some of which you already accepted. I am building on the updated schema .

Schema

CREATE TABLE section (
  section_id serial PRIMARY KEY,
  title text,
  grp integer
);

CREATE TABLE field (
  field_id serial PRIMARY KEY,
  section_id integer REFERENCES section,
  title text,
  type text,
  default_val json
);

CREATE TABLE entry (
  entry_id serial PRIMARY KEY,
  section_id integer REFERENCES section
);

CREATE TABLE data (
  data_id serial PRIMARY KEY,
  field_id integer REFERENCES field,
  entry_id integer REFERENCES entry,
  data json
);

I changed two more details:

section_id instead of id , etc. "id" as column name is an anti-pattern that's gotten popular since a couple of ORMs use it. Don't. Descriptive names are much better. Identical names for identical content is a helpful guideline. It also allows to use the shortcut USING in join clauses:
Don't use reserved words as identifiers. Use legal, lower-case, unquoted names exclusively to make your life easier.
- Are PostgreSQL column names case-sensitive?

Referential integrity?

There is another inherent weakness in your design. What stops entries in data from referencing a field and an entry that don't go together? Closely related question on dba.SE

Enforcing constraints “two tables away”

Query

Not sure if you need the complex design at all. But to answer the question, this is the base query:

SELECT entry_id, field_id, COALESCE(d.data, f.default_val) AS data
FROM   entry     e
JOIN   field     f USING (section_id)
LEFT   JOIN data d USING (field_id, entry_id)  -- can be missing
WHERE  e.section_id = 1
ORDER  BY 1, 2;

The LEFT JOIN is crucial to allow for missing data entries and use the default instead.

SQL Fiddle.

`crosstab()`

The final step is cross tabulation. Cannot show this in SQL Fiddle since the additional module tablefunc is not installed.

Basics for crosstab() :

PostgreSQL Crosstab Query

SELECT * FROM crosstab(
   $$
   SELECT entry_id, field_id, COALESCE(d.data, f.default_val) AS data
   FROM   entry     e
   JOIN   field     f USING (section_id)
   LEFT   JOIN data d USING (field_id, entry_id)  -- can be missing
   WHERE  e.section_id = 1
   ORDER  BY 1, 2
   $$
  ,$$SELECT field_id FROM field WHERE section_id = 1 ORDER BY field_id$$
   ) AS ct (entry int, f1 json, f2 json, f3 json)  -- static
ORDER  BY f3->>'a';  -- static

The tricky part here is the return type of the function. I provided a static type for 3 fields, but you really want that dynamic. Also, I am referencing a field in the json type that may or may not be there ... So build that query dynamically and execute it in a second call.

More about that:

Dynamic alternative to pivot with CASE and GROUP BY

Tricky PostgreSQL join and order query

Question

2 answers

solution1
2 2015-04-02 06:42:35

solution2
1 ACCPTED 2015-04-02 07:30:19

Schema

Referential integrity?

Query

`crosstab()`

Tricky PostgreSQL join and order query

Question

2 answers

solution1 2 2015-04-02 06:42:35

solution2 1 ACCPTED 2015-04-02 07:30:19

Schema

Referential integrity?

Query

crosstab()

solution1
2 2015-04-02 06:42:35

solution2
1 ACCPTED 2015-04-02 07:30:19

`crosstab()`