[英]SQLAlchemy optimize join query time
我有一个由设备生成的事件表,结构如下:
class Events(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
timestamp_event = db.Column(db.DateTime, nullable=False, index=True)
device_id = db.Column(db.Integer, db.ForeignKey('devices.id'), nullable=True)
我必须查询加入:
class Devices(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
dev_name = db.Column(db.String(50))
所以我可以检索每个事件的设备数据。
我正在对一个小时内产生的 20 个最大事件进行排名。 它已经可以工作了,但是随着我的事件表的增长(现在超过 100 万行),查询变得越来越慢。 这是我的代码。 关于如何优化查询的任何想法? 也许是复合索引 device.id + timestamp_event? 即使搜索 timedate 列的一部分,这也会起作用吗?
pkd = db.session.query(db.func.count(Events.id),
db.func.date_format(Events.timestamp_event,'%d/%m %H'),\
Devices.dev_name).select_from(Events).join(Devices)\
.filter(Events.timestamp_event >= (datetime.now() - timedelta(days=peak_days)))\
.group_by(db.func.date_format(Events.timestamp_event,'%Y%M%D%H'))\
.group_by(Events.device_id)\
.order_by(db.func.count(Events.id).desc()).limit(20).all()
这是查询前 3 行的示例 output:事件数、时间 (DD/MM HH) 以及哪个设备:
[(2710, '15/01 16', 'Device 002'),
(2612, '11/01 17', 'Device 033'),
(2133, '13/01 15', 'Device 002'),...]
这是由 SQLAlchemy 生成的 SQL:
SELECT count(events.id) AS count_1,
date_format(events.timestamp_event,
%(date_format_2)s) AS date_format_1,
devices.id AS devices_id,
devices.dev_name AS devices_dev_name
FROM events
INNER JOIN devices ON devices.id = events.device_id
WHERE events.timestamp_event >= %(timestamp_event_1)s
GROUP BY date_format(events.timestamp_event, %(date_format_3)s), events.device_id
ORDER BY count(events.id) DESC
LIMIT %(param_1)s
# This example is for postgresql.
# I'm not sure what db you are using but the date formatting
# is different.
with Session(engine) as session:
# Use subquery to select top 20 event creating device ids
# for each hour since the beginning of the peak.
hour_fmt = "dd/Mon HH24"
hour_col = func.to_char(Event.created_on, hour_fmt).label('event_hour')
event_count_col = func.count(Event.id).label('event_count')
sub_q = select(
event_count_col,
hour_col,
Event.device_id
).filter(
Event.created_on > get_start_of_peak()
).group_by(
hour_col, Event.device_id
).order_by(
event_count_col.desc()
).limit(
20
).alias()
# Now join in the devices to the top ids to get the names.
results = session.execute(
select(
sub_q.c.event_count,
sub_q.c.event_hour,
Device.name
).join_from(
sub_q,
Device,
sub_q.c.device_id == Device.id
).order_by(
sub_q.c.event_count.desc(),
Device.name
)
).all()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.