SQLAlchemy 优化连接查询时间

Question

我有一个由设备生成的事件表，结构如下：

class Events(db.Model):
    id = db.Column(db.Integer, primary_key=True, autoincrement=True)
    timestamp_event = db.Column(db.DateTime, nullable=False, index=True)
    device_id = db.Column(db.Integer, db.ForeignKey('devices.id'), nullable=True)

我必须查询加入：

class Devices(db.Model):
    id = db.Column(db.Integer, primary_key=True, autoincrement=True)
    dev_name = db.Column(db.String(50))

所以我可以检索每个事件的设备数据。

我正在对一个小时内产生的 20 个最大事件进行排名。 它已经可以工作了，但是随着我的事件表的增长（现在超过 100 万行），查询变得越来越慢。 这是我的代码。 关于如何优化查询的任何想法？ 也许是复合索引 device.id + timestamp_event？ 即使搜索 timedate 列的一部分，这也会起作用吗？

pkd = db.session.query(db.func.count(Events.id), 
                             db.func.date_format(Events.timestamp_event,'%d/%m %H'),\
                             Devices.dev_name).select_from(Events).join(Devices)\
                            .filter(Events.timestamp_event >= (datetime.now() - timedelta(days=peak_days)))\
                            .group_by(db.func.date_format(Events.timestamp_event,'%Y%M%D%H'))\
                            .group_by(Events.device_id)\
                            .order_by(db.func.count(Events.id).desc()).limit(20).all()

这是查询前 3 行的示例 output：事件数、时间 (DD/MM HH) 以及哪个设备：

[(2710, '15/01 16', 'Device 002'), 
(2612, '11/01 17', 'Device 033'),
(2133, '13/01 15', 'Device 002'),...]

这是由 SQLAlchemy 生成的 SQL：

SELECT count(events.id) AS count_1, 
date_format(events.timestamp_event, 
%(date_format_2)s) AS date_format_1, 
devices.id AS devices_id, 
devices.dev_name AS devices_dev_name 
FROM events 
INNER JOIN devices ON devices.id = events.device_id 
WHERE events.timestamp_event >= %(timestamp_event_1)s 
GROUP BY date_format(events.timestamp_event, %(date_format_3)s), events.device_id 
ORDER BY count(events.id) DESC 
LIMIT %(param_1)s

Answer 1


# This example is for postgresql.
# I'm not sure what db you are using but the date formatting
# is different.

with Session(engine) as session:
    # Use subquery to select top 20 event creating device ids 
    # for each hour since the beginning of the peak.
    hour_fmt = "dd/Mon HH24"
    hour_col = func.to_char(Event.created_on, hour_fmt).label('event_hour')
    event_count_col = func.count(Event.id).label('event_count')
    sub_q = select(
        event_count_col,
        hour_col,
        Event.device_id
    ).filter(
        Event.created_on > get_start_of_peak()
    ).group_by(
        hour_col, Event.device_id
    ).order_by(
        event_count_col.desc()
    ).limit(
        20
    ).alias()

    # Now join in the devices to the top ids to get the names.
    results = session.execute(
        select(
            sub_q.c.event_count,
            sub_q.c.event_hour,
            Device.name
        ).join_from(
            sub_q,
            Device,
            sub_q.c.device_id == Device.id
        ).order_by(
            sub_q.c.event_count.desc(),
            Device.name
        )
    ).all()

SQLAlchemy 优化连接查询时间

问题描述

1 个解决方案

解决方案1
0 2022-01-29 23:21:52

SQLAlchemy 优化连接查询时间

问题描述

1 个解决方案

解决方案1 0 2022-01-29 23:21:52

解决方案1
0 2022-01-29 23:21:52