[英]Why is SQLAlchemy count() much slower than the raw query?
I'm using SQLAlchemy with a MySQL database and I'd like to count the rows in a table (roughly 300k). 我正在使用带有MySQL数据库的SQLAlchemy,我想计算表中的行数(大约300k)。 The SQLAlchemy count function takes about 50 times as long to run as writing the same query directly in MySQL.
SQLAlchemy 计数函数的运行时间大约是在MySQL中直接写入相同查询的50倍。 Am I doing something wrong?
难道我做错了什么?
# this takes over 3 seconds to return
session.query(Segment).count()
However: 然而:
SELECT COUNT(*) FROM segments;
+----------+
| COUNT(*) |
+----------+
| 281992 |
+----------+
1 row in set (0.07 sec)
The difference in speed increases with the size of the table (it is barely noticeable under 100k rows). 速度的差异随着桌子的大小而增加(在100k行下几乎不可察觉)。
Update 更新
Using session.query(Segment.id).count()
instead of session.query(Segment).count()
seems to do the trick and get it up to speed. 使用
session.query(Segment.id).count()
而不是session.query(Segment).count()
似乎可以解决这个问题并使其加速。 I'm still puzzled why the initial query is slower though. 我仍然感到困惑,为什么初始查询速度较慢。
Unfortunately MySQL has terrible, terrible support of subqueries and this is affecting us in a very negative way. 不幸的是,MySQL对子查询的支持非常可怕,这对我们的影响很大。 The SQLAlchemy docs point out that the "optimized" query can be achieved using
query(func.count(Segment.id))
: SQLAlchemy文档指出可以使用
query(func.count(Segment.id))
来实现“优化”查询:
Return a count of rows this Query would return.
返回此Query将返回的行数。
This generates the SQL for this Query as follows:
这将为此Query生成SQL,如下所示:
SELECT count(1) AS count_1 FROM ( SELECT <rest of query follows...> ) AS anon_1
For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use func expressions in conjunction with query(), ie:
要对要计数的特定列进行细粒度控制,跳过子查询的使用或以其他方式控制FROM子句,或使用其他聚合函数,请将func表达式与query()结合使用,即:
from sqlalchemy import func # count User records, without # using a subquery. session.query(func.count(User.id)) # return count of user "id" grouped # by "name" session.query(func.count(User.id)).\\ group_by(User.name) from sqlalchemy import distinct # count distinct "name" values session.query(func.count(distinct(User.name)))
It took me a long time to find this as the solution to my problem. 我花了很长时间才发现这是我问题的解决方案。 I was getting the following error:
我收到以下错误:
sqlalchemy.exc.DatabaseError: (mysql.connector.errors.DatabaseError) 126 (HY000): Incorrect key file for table '/tmp/#sql_40ab_0.MYI';
sqlalchemy.exc.DatabaseError:(mysql.connector.errors.DatabaseError)126(HY000):表'/tmp/#sql_40ab_0.MYI'的密钥文件不正确; try to repair it
尝试修复它
The problem was resolved when I changed this: 当我改变这个问题时问题得到了解决:
query = session.query(rumorClass).filter(rumorClass.exchangeDataState == state)
return query.count()
to this: 对此:
query = session.query(func.count(rumorClass.id)).filter(rumorClass.exchangeDataState == state)
return query.scalar()
The reason is that SQLAlchemy's count() is counting the results of a subquery which is still doing the full amount of work to retrieve the rows you are counting. 原因是SQLAlchemy的count()正在计算子查询的结果,该子查询仍在完成检索您正在计算的行的全部工作量。 This behavior is agnostic of the underlying database;
此行为与底层数据库无关; it isn't a problem with MySQL.
它不是MySQL的问题。
The SQLAlchemy docs explain how to issue a count without a subquery by importing func
from sqlalchemy
. SQLAlchemy 文档通过从
sqlalchemy
导入func
来解释如何在没有子查询的情况下发出计数。
session.query(func.count(User.id)).scalar()
>>>SELECT count(users.id) AS count_1 \nFROM users')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.