简体   繁体   English

在SQL Alchemy中联接多个表后如何分组

[英]How to GROUP BY after JOINing multiple tables in SQL Alchemy

I'm very new to Flask/SQL Alchemy and I'm trying to get a summary of answers for an MTurk survey like so: 我是Flask / SQL Alchemy的新手,我正尝试为MTurk调查获得答案摘要,如下所示:

Filename    Answered_A    Answered_B    Answered_C    Answered_D    Answered_E
file1.mp3   10            8             5             0             1
file2.mp3   1             26            2             3             7
file3.mp3   4             0             0             3             57
file4.mp3   1             6             1             5             28

With the following models (omitted irrelevant fields for brevity): 使用以下模型(为简洁起见,省略了不相关的字段):

class Survey(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    tasks = db.relationship('Task', backref='survey', lazy='dynamic')

class Task(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    survey_id = db.Column(db.Integer, db.ForeignKey('survey.id'))
    assignments = db.relationship('Assignment', backref='task', lazy='dynamic')

class Assignment(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    task_id = db.Column(db.Integer, db.ForeignKey('task.id'))
    responses = db.relationship('Response', backref='assignment', lazy='dynamic')

class Response(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    assignment_id = db.Column(db.Integer, db.ForeignKey('assignment.id'))
    response_item = db.Column(db.String(255))
    response_value = db.Column(db.String(255))

Where response_item would be the Filename, and response_value would be 1-5, represented by Answered_A, Answered_B..., etc. The models represented above are all cascading 1-m relationships. 其中response_item是文件名,response_value将是1-5,由Answered_A,Answered_B ...等表示。上面表示的模型都是级联的1-m关系。

I've followed the approach tried here: ( Join multiple tables in SQLAlchemy/Flask ) like so: 我遵循了在这里尝试过的方法:( 在SQLAlchemy / Flask中连接多个表 )如下:

q = (db.session.query(Survey, Task, Assignment, Response)
    .join(Task, Survey.id==Task.survey_id)
    .join(Assignment, Task.id==Assignment.task_id)
    .join(Response, Assignment.id==Response.assignment_id)).all()

and it results in a list of tuples like in the question (Survey, Task, Assignment, Result). 它会生成一个元组列表,例如问题(调查,任务,分配,结果)。

What I'd like to accomplish is a query with the correct group by's for a Survey.id=4 , for example, and get the structure listed above. 我要完成的是一个查询,例如,对Survey.id=4使用正确的group by进行查询,并获得上面列出的结构。 The answers, as mentioned, range from Answered_A to Answered_E, or from 1-5 if that makes it easier. 如前所述,答案的范围从Answered_A到Answered_E,如果可以,则范围从1-5。

I made a github for you showing how to do this: 我为您制作了一个github,展示了如何执行此操作:

https://github.com/researcher2/stackoverflow_57023616 https://github.com/researcher2/stackoverflow_57023616

As I didn't have access to your data I did a mockup, can be found in create_db.py. 由于我无权访问您的数据,因此在create_db.py中找到了一个模型。

I make a row for each file name and the counts of its possible choices (starting at 0). 我为每个文件名及其可能的选项(从0开始)计数。 Then go through the Responses we get back from the db and just increment our counts. 然后遍历从数据库返回的响应,然后增加计数。

I may come back to this tomorrow and play around with the SQL. 我可能明天再来讨论SQL。

server.py server.py

from app import app, db
from flask import render_template
from models import Survey, Task, Assignment, Response

@app.route('/')
def index():
    (headers, fields, data) = getSummary()
    return render_template("survey_summary.html", headers=headers, fields=fields, data=data)

def getSummary():
    fields = ["Filename", "A", "B", "C", "D", "E"] # column names for output
    headers = dict() # custom header names for given fieldname (no difference here)
    for field in fields:
        headers[field] = field

    # build data structures
    data = []
    rowMap = dict()    
    fileNames = ["file1.mp3", "file2.mp3", "file3.mp3", "file4.mp3"]    

    for fileName in fileNames:
        row = dict()
        row["Filename"] = fileName
        row["A"] = 0
        row["B"] = 0
        row["C"] = 0
        row["D"] = 0
        row["E"] = 0
        data.append(row)
        rowMap[fileName] = row

    # query
    query = db.session.query(Survey, Task, Assignment, Response) \
                      .join(Task, Survey.id==Task.survey_id) \
                      .join(Assignment, Task.id==Assignment.task_id) \
                      .join(Response, Assignment.id==Response.assignment_id) \
                      .filter(Survey.id == 1)

    results = query.all()

    # summarise counts
    for (_, _, _, response) in results:
        rowMap[response.response_item][response.response_value] = rowMap[response.response_item][response.response_value] + 1

    return (headers, fields, data)

templates/survey_summary.html 模板/ survey_summary.html

I use something similar to this template for most table output these days and just build up the headers, fields and data collections first. 如今,我对大多数表输出都使用与此模板类似的东西,只是首先建立标题,字段和数据集合。 Need to look into pandas, would imagine somebody has done something similar. 需要研究熊猫,会想象有人做了类似的事情。

<html>
<head>
    <title>mturk survey summary</title>
</head>
<body>
    <table>
        <tr>
            {% for field in fields %}
            <th>{{headers[field]}}</th>
            {% endfor %}
        </tr>
        {% for row in data %}
        <tr>
            {% for field in fields %}
            <td>
                {{ row[field] | safe }}
            </td>
            {% endfor %}
        </tr>
        {% endfor %}
    </table>
</body>
</html>

OK, I came back and did the SQL, you can swap this in if you want: 好的,我回来做SQL了,如果需要,可以将其交换:

# select response_item, response_value, count(response_value) 
# from response
# group by response_item, response_value
query = db.session.query(Response.response_item, Response.response_value, func.count(Response.response_value)) \
                  .join(Assignment, Response.assignment_id == Assignment.id) \
                  .join(Task, Assignment.task_id==Task.id) \
                  .join(Survey, Survey.id==Task.survey_id) \
                  .filter(Survey.id == 1) \
                  .group_by(Response.response_item, Response.response_value)

print(query)
results = query.all()

for (item, value, count) in results:
    rowMap[item][value] = count

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM