简体   繁体   中英

How to calculate percentage of a group in SQLAlchemy?

I am building a "quiz app" in Python, and I need to store results in a SQL database. I want to use SQLAlchemy Python library to interact with the database. Each user of my app will be asked 3 randomly selected questions from a predetermined set of 100 possible questions. Each question can only be answered "Yes" or "No" (ie True or False ). I store answers in a table defined as follows:

class Answer(Base):
    __tablename__ = "Answers"
    
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("Users.id"), nullable=False)
    question_id = Column(Integer)
    answer = Column(Boolean, nullable=False)
    
    user = relationship("User", back_populates="answers")

After all users complete the quiz, I calculate how many times a certain question was answered by users:

tot_each_question = (db_session
                     .query(Answer.question_id,
                            count_questions.label("tot_answers_for_question"))
                     .group_by(Answer.question_id)
                     )

I can also calculate how many times a certain question was answered "Yes" (ie True ) by users:

tot_true_for_question = (db_session
                         .query(Answer.question_id,
                                count_questions.label("tot_true_for_question"))
                         .filter(Answer.answer == True)
                         .group_by(Answer.question_id)
                         )

How do I calculate the percentage each question was answered "Yes" by users, with SQLAlchemy? I can easily do that with basic Python dictionaries:

dict_tot_each_question = {row.question_id: row.tot_answers_for_question
                          for row in tot_each_question.all()}

dict_tot_true_for_question = {row.question_id: row.tot_true_for_question
                              for row in tot_true_for_question.all()}

dict_percent_true_for_question = {}
for question_id, tot_answers in dict_tot_each_question.items():
    tot_true = dict_tot_true_for_question.get(question_id, 0)
    percent_true = tot_true / tot_answers * 100
    dict_percent_true_for_question[question_id] = percent_true

But I prefer to use SQLAlchemy functionality to obtain the same result. Is it possible to do that in SQLAlchemy? Would it be convenient and efficient to do that in SQLAlchemy, or would my solution based on Python dictionary be better for any reason?

Just combining two expressions from the two queries you already have into one will give you the desired result:

q = (
    session.query(
        Question.id,
        (100 * func.sum(cast(Answer.answer, Integer)) / func.count(Answer.answer)).label("perc_true"),
    )
    .outerjoin(Answer)
    .group_by(Question.id)
)

As you can see above, i used COUNT function for all the answers.

Another item to note is that my query starts with the Question and JOINs the Answer table. The reason for this is that in case there is the Question with no answers, you will still see the (#id, NULL) returned instead of not seeing a row at all if you use only Answers table. But if you do not care about this corner case being handled as I see it, you could do it your way:

q = (
    session.query(
        Answer.question_id,
        (100 * func.sum(Answer.answer) / func.count(Answer.answer)).label("perc_true"),
    )
    .group_by(Answer.question_id)
)

Finally, one more assumption i made is that your database will handle the true as 1 for the sake of proper SUM after casting to Integer . Shall this not be the case, please refer to multiple answers in this question on how to handle this: postgresql - sql - count of `true` values


BONUS:

When i find myself asking for some aggregation related questions on the model level, i often implement these on the model directly using the Hybrid Attributes extension.

Code below will give you and indication on how you could use it for your case:

class Answer(Base):
    __tablename__ = "answers"

    id = Column(Integer, primary_key=True)
    # user_id = Column(Integer, ForeignKey("users.id"), nullable=False)
    question_id = Column(Integer, ForeignKey("questions.id"))
    answer = Column(Boolean, nullable=False)

    # user = relationship("User", back_populates="answers")
    question = relationship("Question", back_populates="answers")


class Question(Base):
    __tablename__ = "questions"

    id = Column(Integer, primary_key=True)
    question = Column(String, nullable=False)

    answers = relationship("Answer", back_populates="question")

    @hybrid_property
    def answers_cnt(self):
        return len(list(self.answers))

    @hybrid_property
    def answers_yes(self):
        return len(list(_ for _ in self.answers if _.answer))

    @hybrid_property
    def answers_yes_percentage(self):
        return (
            100.0 * self.answers_yes / self.answers_cnt if self.answers_cnt != 0 else None
        )

    @answers_cnt.expression
    def answers_cnt(cls):
        return (
            select(func.count(Answer.id))
            .where(Answer.question_id == cls.id)
            .label("answers_cnt")
        )

    @answers_yes.expression
    def answers_yes(cls):
        return (
            select(func.count(Answer.id))
            .where(Answer.question_id == cls.id)
            .where(Answer.answer == True)
            .label("answers_yes")
        )

    @answers_yes_percentage.expression
    def answers_yes_percentage(cls):
        return (
            case(
                [(cls.answers_cnt == 0, None)],
                else_=(
                    100
                    * cast(cls.answers_yes, Numeric)
                    / cast(cls.answers_cnt, Numeric)
                ),
            )
        ).label("answers_yes_percentage")

In this case you can do the calculations both in python or using a query.

  1. Python (this will load all Answers from the database, so not efficient if the data is not yet loaded into memory)

     q = session.query(Question) for question in q: print(question, question.answers_yes_percentage)
  2. Database: this is very efficient because you just run one query, similar to the separate query in the answer you are looking into, but the result is returned separately and as a property on the model

     q = session.query(Question, Question.answers_yes_percentage) for question, percentage in q: print(question, percentage)

Please note that above works with 1.4 version of sqlalchemy, but might need other syntax for prior versions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM