简体   繁体   中英

How to GROUP BY after JOINing multiple tables in SQL Alchemy

I'm very new to Flask/SQL Alchemy and I'm trying to get a summary of answers for an MTurk survey like so:

Filename    Answered_A    Answered_B    Answered_C    Answered_D    Answered_E
file1.mp3   10            8             5             0             1
file2.mp3   1             26            2             3             7
file3.mp3   4             0             0             3             57
file4.mp3   1             6             1             5             28

With the following models (omitted irrelevant fields for brevity):

class Survey(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    tasks = db.relationship('Task', backref='survey', lazy='dynamic')

class Task(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    survey_id = db.Column(db.Integer, db.ForeignKey('survey.id'))
    assignments = db.relationship('Assignment', backref='task', lazy='dynamic')

class Assignment(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    task_id = db.Column(db.Integer, db.ForeignKey('task.id'))
    responses = db.relationship('Response', backref='assignment', lazy='dynamic')

class Response(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    assignment_id = db.Column(db.Integer, db.ForeignKey('assignment.id'))
    response_item = db.Column(db.String(255))
    response_value = db.Column(db.String(255))

Where response_item would be the Filename, and response_value would be 1-5, represented by Answered_A, Answered_B..., etc. The models represented above are all cascading 1-m relationships.

I've followed the approach tried here: ( Join multiple tables in SQLAlchemy/Flask ) like so:

q = (db.session.query(Survey, Task, Assignment, Response)
    .join(Task, Survey.id==Task.survey_id)
    .join(Assignment, Task.id==Assignment.task_id)
    .join(Response, Assignment.id==Response.assignment_id)).all()

and it results in a list of tuples like in the question (Survey, Task, Assignment, Result).

What I'd like to accomplish is a query with the correct group by's for a Survey.id=4 , for example, and get the structure listed above. The answers, as mentioned, range from Answered_A to Answered_E, or from 1-5 if that makes it easier.

I made a github for you showing how to do this:

https://github.com/researcher2/stackoverflow_57023616

As I didn't have access to your data I did a mockup, can be found in create_db.py.

I make a row for each file name and the counts of its possible choices (starting at 0). Then go through the Responses we get back from the db and just increment our counts.

I may come back to this tomorrow and play around with the SQL.

server.py

from app import app, db
from flask import render_template
from models import Survey, Task, Assignment, Response

@app.route('/')
def index():
    (headers, fields, data) = getSummary()
    return render_template("survey_summary.html", headers=headers, fields=fields, data=data)

def getSummary():
    fields = ["Filename", "A", "B", "C", "D", "E"] # column names for output
    headers = dict() # custom header names for given fieldname (no difference here)
    for field in fields:
        headers[field] = field

    # build data structures
    data = []
    rowMap = dict()    
    fileNames = ["file1.mp3", "file2.mp3", "file3.mp3", "file4.mp3"]    

    for fileName in fileNames:
        row = dict()
        row["Filename"] = fileName
        row["A"] = 0
        row["B"] = 0
        row["C"] = 0
        row["D"] = 0
        row["E"] = 0
        data.append(row)
        rowMap[fileName] = row

    # query
    query = db.session.query(Survey, Task, Assignment, Response) \
                      .join(Task, Survey.id==Task.survey_id) \
                      .join(Assignment, Task.id==Assignment.task_id) \
                      .join(Response, Assignment.id==Response.assignment_id) \
                      .filter(Survey.id == 1)

    results = query.all()

    # summarise counts
    for (_, _, _, response) in results:
        rowMap[response.response_item][response.response_value] = rowMap[response.response_item][response.response_value] + 1

    return (headers, fields, data)

templates/survey_summary.html

I use something similar to this template for most table output these days and just build up the headers, fields and data collections first. Need to look into pandas, would imagine somebody has done something similar.

<html>
<head>
    <title>mturk survey summary</title>
</head>
<body>
    <table>
        <tr>
            {% for field in fields %}
            <th>{{headers[field]}}</th>
            {% endfor %}
        </tr>
        {% for row in data %}
        <tr>
            {% for field in fields %}
            <td>
                {{ row[field] | safe }}
            </td>
            {% endfor %}
        </tr>
        {% endfor %}
    </table>
</body>
</html>

OK, I came back and did the SQL, you can swap this in if you want:

# select response_item, response_value, count(response_value) 
# from response
# group by response_item, response_value
query = db.session.query(Response.response_item, Response.response_value, func.count(Response.response_value)) \
                  .join(Assignment, Response.assignment_id == Assignment.id) \
                  .join(Task, Assignment.task_id==Task.id) \
                  .join(Survey, Survey.id==Task.survey_id) \
                  .filter(Survey.id == 1) \
                  .group_by(Response.response_item, Response.response_value)

print(query)
results = query.all()

for (item, value, count) in results:
    rowMap[item][value] = count

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM