简体   繁体   English

如何计算 SQLAlchemy 中组的百分比?

[英]How to calculate percentage of a group in SQLAlchemy?

I am building a "quiz app" in Python, and I need to store results in a SQL database.我正在 Python 中构建一个“测验应用程序”,我需要将结果存储在 SQL 数据库中。 I want to use SQLAlchemy Python library to interact with the database.我想使用 SQLAlchemy Python 库与数据库进行交互。 Each user of my app will be asked 3 randomly selected questions from a predetermined set of 100 possible questions.我的应用程序的每个用户将被问到从预先确定的 100 个可能问题中随机选择的 3 个问题。 Each question can only be answered "Yes" or "No" (ie True or False ).每个问题只能回答“是”或“否”(即TrueFalse )。 I store answers in a table defined as follows:我将答案存储在定义如下的表中:

class Answer(Base):
    __tablename__ = "Answers"
    
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("Users.id"), nullable=False)
    question_id = Column(Integer)
    answer = Column(Boolean, nullable=False)
    
    user = relationship("User", back_populates="answers")

After all users complete the quiz, I calculate how many times a certain question was answered by users:在所有用户完成测验后,我计算某个问题被用户回答的次数:

tot_each_question = (db_session
                     .query(Answer.question_id,
                            count_questions.label("tot_answers_for_question"))
                     .group_by(Answer.question_id)
                     )

I can also calculate how many times a certain question was answered "Yes" (ie True ) by users:我还可以计算某个问题被用户回答“是”(即True )的次数:

tot_true_for_question = (db_session
                         .query(Answer.question_id,
                                count_questions.label("tot_true_for_question"))
                         .filter(Answer.answer == True)
                         .group_by(Answer.question_id)
                         )

How do I calculate the percentage each question was answered "Yes" by users, with SQLAlchemy?如何使用 SQLAlchemy 计算用户回答“是”的每个问题的百分比? I can easily do that with basic Python dictionaries:我可以使用基本的 Python 字典轻松做到这一点:

dict_tot_each_question = {row.question_id: row.tot_answers_for_question
                          for row in tot_each_question.all()}

dict_tot_true_for_question = {row.question_id: row.tot_true_for_question
                              for row in tot_true_for_question.all()}

dict_percent_true_for_question = {}
for question_id, tot_answers in dict_tot_each_question.items():
    tot_true = dict_tot_true_for_question.get(question_id, 0)
    percent_true = tot_true / tot_answers * 100
    dict_percent_true_for_question[question_id] = percent_true

But I prefer to use SQLAlchemy functionality to obtain the same result.但我更喜欢使用 SQLAlchemy 功能来获得相同的结果。 Is it possible to do that in SQLAlchemy?是否可以在 SQLAlchemy 中做到这一点? Would it be convenient and efficient to do that in SQLAlchemy, or would my solution based on Python dictionary be better for any reason?在 SQLAlchemy 中这样做是否方便高效,或者我基于 Python 字典的解决方案是否会更好?

Just combining two expressions from the two queries you already have into one will give you the desired result:只需将您已经拥有的两个查询中的两个表达式组合成一个表达式即可获得所需的结果:

q = (
    session.query(
        Question.id,
        (100 * func.sum(cast(Answer.answer, Integer)) / func.count(Answer.answer)).label("perc_true"),
    )
    .outerjoin(Answer)
    .group_by(Question.id)
)

As you can see above, i used COUNT function for all the answers.正如您在上面看到的,我使用COUNT function 来获得所有答案。

Another item to note is that my query starts with the Question and JOINs the Answer table.另一个需要注意的事项是,我的查询Question开头并Answer JOINs The reason for this is that in case there is the Question with no answers, you will still see the (#id, NULL) returned instead of not seeing a row at all if you use only Answers table.这样做的原因是,如果有没有答案的Question ,如果只使用Answers表,您仍然会看到返回的(#id, NULL)而不是根本看不到一行。 But if you do not care about this corner case being handled as I see it, you could do it your way:但是,如果您不关心我所看到的这种极端情况,您可以按照自己的方式进行处理:

q = (
    session.query(
        Answer.question_id,
        (100 * func.sum(Answer.answer) / func.count(Answer.answer)).label("perc_true"),
    )
    .group_by(Answer.question_id)
)

Finally, one more assumption i made is that your database will handle the true as 1 for the sake of proper SUM after casting to Integer .最后,我做出的另一个假设是,在转换为Integer之后,为了正确的SUM ,您的数据库将处理true1 Shall this not be the case, please refer to multiple answers in this question on how to handle this: postgresql - sql - count of `true` values如果不是这种情况,请参阅此问题中有关如何处理此问题的多个答案: postgresql - sql - `true` 值的计数


BONUS:奖金:

When i find myself asking for some aggregation related questions on the model level, i often implement these on the model directly using the Hybrid Attributes extension.当我发现自己在 model 级别上询问一些与聚合相关的问题时,我经常使用混合属性扩展直接在 model 上实现这些。

Code below will give you and indication on how you could use it for your case:下面的代码将为您提供并说明如何将其用于您的案例:

class Answer(Base):
    __tablename__ = "answers"

    id = Column(Integer, primary_key=True)
    # user_id = Column(Integer, ForeignKey("users.id"), nullable=False)
    question_id = Column(Integer, ForeignKey("questions.id"))
    answer = Column(Boolean, nullable=False)

    # user = relationship("User", back_populates="answers")
    question = relationship("Question", back_populates="answers")


class Question(Base):
    __tablename__ = "questions"

    id = Column(Integer, primary_key=True)
    question = Column(String, nullable=False)

    answers = relationship("Answer", back_populates="question")

    @hybrid_property
    def answers_cnt(self):
        return len(list(self.answers))

    @hybrid_property
    def answers_yes(self):
        return len(list(_ for _ in self.answers if _.answer))

    @hybrid_property
    def answers_yes_percentage(self):
        return (
            100.0 * self.answers_yes / self.answers_cnt if self.answers_cnt != 0 else None
        )

    @answers_cnt.expression
    def answers_cnt(cls):
        return (
            select(func.count(Answer.id))
            .where(Answer.question_id == cls.id)
            .label("answers_cnt")
        )

    @answers_yes.expression
    def answers_yes(cls):
        return (
            select(func.count(Answer.id))
            .where(Answer.question_id == cls.id)
            .where(Answer.answer == True)
            .label("answers_yes")
        )

    @answers_yes_percentage.expression
    def answers_yes_percentage(cls):
        return (
            case(
                [(cls.answers_cnt == 0, None)],
                else_=(
                    100
                    * cast(cls.answers_yes, Numeric)
                    / cast(cls.answers_cnt, Numeric)
                ),
            )
        ).label("answers_yes_percentage")

In this case you can do the calculations both in python or using a query.在这种情况下,您可以在 python 或使用查询中进行计算。

  1. Python (this will load all Answers from the database, so not efficient if the data is not yet loaded into memory) Python(这将从数据库中加载所有答案,因此如果数据尚未加载到内存中则效率不高)

     q = session.query(Question) for question in q: print(question, question.answers_yes_percentage)
  2. Database: this is very efficient because you just run one query, similar to the separate query in the answer you are looking into, but the result is returned separately and as a property on the model数据库:这非常有效,因为您只需运行一个查询,类似于您正在查看的答案中的单独查询,但结果单独返回并作为 model 上的属性

     q = session.query(Question, Question.answers_yes_percentage) for question, percentage in q: print(question, percentage)

Please note that above works with 1.4 version of sqlalchemy, but might need other syntax for prior versions.请注意,以上适用于 sqlalchemy 的 1.4 版本,但可能需要其他语法用于之前的版本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM