简体   繁体   English

使用sqlalchemy进行连接压缩

[英]Connection compression with sqlalchemy

I am working on a project that has multiple remote devices uploading data to a single MySQL database. 我正在一个有多个远程设备将数据上载到单个MySQL数据库的项目中。 (Some of these devices use a cell modem with a data cap) (其中一些设备使用带有数据上限的蜂窝调制解调器)

Each device uploads power usage data with 126 columns of Floating point numbers with precision of 4(xx.1234) every 2 seconds. 每个设备每2秒上传一次126列浮点数的电源使用数据,精度为4(xx.1234)。

Additionally a 208 column average is uploaded around 2200 times a day. 此外,每天平均上传208列约2200次。 at 1, 5 and 15 minute intervals. 每隔1、5和15分钟。 I could probably just calculate these from the 2 second data but it was much easier and less cpu intensive to do the calculations on the raw data in python and this is just for testing out the idea. 我可能可以仅从2秒的数据中计算出这些数据,但是用python中的原始数据进行计算要容易得多,并且不需要占用大量CPU,这只是为了验证这一点。

Highest data usage per day I've seen is 130 MiB 我每天看到的最高数据使用量是130 MiB

csv version of single rows. 单行的csv版本。

# id, dtime, hz, min_v1, avg_v1, max_v1, min_v2, avg_v2, max_v2, min_v3, avg_v3, max_v3, min_i1, avg_i1, max_i1, min_i2, avg_i2, max_i2, min_i3, avg_i3, max_i3, i_n, l1_kw_pa, l2_kw_pb, l3_kw_pc, avg_kw_t, l1_kvar_qa, l2_kvar_qb, l3_kvar_qc, avg_kvar_t, l1_kva_sa, l2_kva_sb, l3_kva_sc, avg_kva_t, l1_pf_pfa, l2_pf_pfb, l3_pf_pfc, avg_pf_t, power_dmd, kvar_dmd, kva_dmd, kwh_imp, kvarh_imp, kwh_t, kvarh_t, kvah_t, v1_thd, v2_thd, v3_thd, i1_thd, i2_thd, i3_thd, p_seq_real_v, p_seq_comp_v, n_seq_real_v, n_seq_comp_v, z_seq_real_v, z_seq_comp_v, p_seq_real_i, p_seq_comp_i, n_seq_real_i, n_seq_comp_i, z_seq_real_i, z_seq_comp_i, v2_pa, v3_pa, i1_pa, i2_pa, i3_pa, vh1_2, vh1_3, vh1_5, vh1_7, vh1_9, vh1_11, vh1_13, vh1_odd, vh1_even, vh1_cf, vh2_2, vh2_3, vh2_5, vh2_7, vh2_9, vh2_11, vh2_13, vh2_odd, vh2_even, vh2_cf, vh3_2, vh3_3, vh3_5, vh3_7, vh3_9, vh3_11, vh3_13, vh3_odd, vh3_even, vh3_cf, ih1_3, ih1_5, ih1_7, ih1_9, ih1_11, ih1_13, ih1_odd, ih1_even, ih1_kf, ih2_3, ih2_5, ih2_7, ih2_9, ih2_11, ih2_13, ih2_odd, ih2_even, ih2_kf, ih3_3, ih3_5, ih3_7, ih3_9, ih3_11, ih3_13, ih3_odd, ih3_even, ih3_kf
1, 2015-03-09 20:12:05, 59.97, 123.1, 122.992, 123.1, 122.5, 122.381, 122.5, 121.8, 121.749, 121.9, 0, 1.91508, 0, 0, 13.4917, 0, 0, 7.38669, 0, 19.9551, -5.54378, 226.589, 127.961, 348.94, 235.676, 1631.89, -887.699, 978.145, 235.981, 1650.68, 899.93, 2785.75, -0.02348, 0.13701, 0.14203, 0.125, 47.335, 1299.89, 3203.01, 1272600, 863619, 1272850, 863720, 1846930, 0.0148, 0.0148, 0.0123, , , , 122.4, 0.2, 0.1, 0.1, 0.6, -0.1, 0, 0, 0, 0, 0, 0, 119.9, 240.2, 92, 203.8, 160.1, , , , , , , , , , 1.428, , , , , , , , , , 1.434, , , , , , , , , , 1.427, , , , , , , , , , , , , , , , , , , , , , , , , , , 

# id, dtime, min_hz, avg_hz, max_hz, min_min_v1, avg_avg_v1, max_max_v1, min_min_v2, avg_avg_v2, max_max_v2, min_min_v3, avg_avg_v3, max_max_v3, min_min_i1, avg_avg_i1, max_max_i1, min_min_i2, avg_avg_i2, max_max_i2, min_min_i3, avg_avg_i3, max_max_i3, min_i_n, avg_i_n, max_i_n, min_l1_kw_pa, avg_l1_kw_pa, max_l1_kw_pa, min_l2_kw_pb, avg_l2_kw_pb, max_l2_kw_pb, min_l3_kw_pc, avg_l3_kw_pc, max_l3_kw_pc, min_avg_kw_t, avg_avg_kw_t, max_avg_kw_t, min_l1_kvar_qa, avg_l1_kvar_qa, max_l1_kvar_qa, min_l2_kvar_qb, avg_l2_kvar_qb, max_l2_kvar_qb, min_l3_kvar_qc, avg_l3_kvar_qc, max_l3_kvar_qc, min_avg_kvar_t, avg_avg_kvar_t, max_avg_kvar_t, min_l1_kva_sa, avg_l1_kva_sa, max_l1_kva_sa, min_l2_kva_sb, avg_l2_kva_sb, max_l2_kva_sb, min_l3_kva_sc, avg_l3_kva_sc, max_l3_kva_sc, min_avg_kva_t, avg_avg_kva_t, max_avg_kva_t, min_l1_pf_pfa, avg_l1_pf_pfa, max_l1_pf_pfa, min_l2_pf_pfb, avg_l2_pf_pfb, max_l2_pf_pfb, min_l3_pf_pfc, avg_l3_pf_pfc, max_l3_pf_pfc, min_avg_pf_t, avg_avg_pf_t, max_avg_pf_t, max_power_dmd, max_kvar_dmd, max_kva_dmd, max_kwh_imp, max_kvarh_imp, max_kwh_t, max_kvarh_t, max_kvah_t, min_v1_thd, avg_v1_thd, max_v1_thd, min_v2_thd, avg_v2_thd, max_v2_thd, min_v3_thd, avg_v3_thd, max_v3_thd, min_i1_thd, avg_i1_thd, max_i1_thd, min_i2_thd, avg_i2_thd, max_i2_thd, min_i3_thd, avg_i3_thd, max_i3_thd, p_seq_real_v, p_seq_comp_v, n_seq_real_v, n_seq_comp_v, z_seq_real_v, z_seq_comp_v, p_seq_real_i, p_seq_comp_i, n_seq_real_i, n_seq_comp_i, z_seq_real_i, z_seq_comp_i, v2_pa, v3_pa, i1_pa, i2_pa, i3_pa, vh1_2, vh1_3, vh1_5, vh1_7, vh1_9, vh1_11, vh1_13, min_vh1_odd, avg_vh1_odd, max_vh1_odd, min_vh1_even, avg_vh1_even, max_vh1_even, min_vh1_cf, avg_vh1_cf, max_vh1_cf, vh2_2, vh2_3, vh2_5, vh2_7, vh2_9, vh2_11, vh2_13, min_vh2_odd, avg_vh2_odd, max_vh2_odd, min_vh2_even, avg_vh2_even, max_vh2_even, min_vh2_cf, avg_vh2_cf, max_vh2_cf, vh3_2, vh3_3, vh3_5, vh3_7, vh3_9, vh3_11, vh3_13, min_vh3_odd, avg_vh3_odd, max_vh3_odd, min_vh3_even, avg_vh3_even, max_vh3_even, min_vh3_cf, avg_vh3_cf, max_vh3_cf, ih1_3, ih1_5, ih1_7, ih1_9, ih1_11, ih1_13, min_ih1_odd, avg_ih1_odd, max_ih1_odd, min_ih1_even, avg_ih1_even, max_ih1_even, min_ih1_kf, avg_ih1_kf, max_ih1_kf, ih2_3, ih2_5, ih2_7, ih2_9, ih2_11, ih2_13, min_ih2_odd, avg_ih2_odd, max_ih2_odd, min_ih2_even, avg_ih2_even, max_ih2_even, min_ih2_kf, avg_ih2_kf, max_ih2_kf, ih3_3, ih3_5, ih3_7, ih3_9, ih3_11, ih3_13, min_ih3_odd, avg_ih3_odd, max_ih3_odd, min_ih3_even, avg_ih3_even, max_ih3_even, min_ih3_kf, avg_ih3_kf, max_ih3_kf
1, 2015-03-25 12:05:03, 59.9351, 59.9515, 59.9651, 123, 123.165, 123.5, 122.2, 122.379, 122.7, 121.9, 121.986, 122.3, 0, 0, 0, 0, 22.8891, 0, 0, 6.69319, 0, 0, 0, 0, 0, 0, 0, 2689.78, 2741.23, 2827.1, 761.323, 767.21, 775.285, 3455.01, 3509.49, 3597.13, 0, 0, 0, -47.318, -29.5382, -16.3021, 142.391, 147.547, 152.868, 97.1515, 117.985, 131.682, 0, 0, 0, 2744.95, 2800.07, 2883.35, 799.135, 818.643, 854.545, 3545.14, 3619.77, 3699.94, 0.99903, 0.99903, 0.99903, 0.97461, 0.978048, 0.98087, 0.8978, 0.936465, 0.95534, 0.95832, 0.968614, 0.97364, 3497.05, 154.864, 3610.55, 529.2, 46.8, 529.8, 47.3, 568.1, 0.0147, 0.0149417, 0.0153, 0.0155, 0.0158617, 0.0164, 0.0138, 0.0141883, 0.0149, 0, 0, 0, 0, 0, 0, 0, 0, 0, 122.452, -0.0783333, 0.241667, 0.103333, 0.308333, -0.0666667, 0, 0, 0, 0, 0, 0, 119.985, 240.133, 0, 121.318, 250.655, 0, 0.01061, 0, 0.003965, 0, 0, 0, 0.0131, 0.013455, 0.014, 0.0058, 0.006575, 0.007, 1.429, 1.43403, 1.435, 0, 0.0117267, 0, 0.00466333, 0, 0, 0, 0.0142, 0.0146267, 0.0152, 0.0055, 0.00624333, 0.0066, 1.435, 1.43673, 1.438, 0, 0.0110717, 0, 0, 0, 0, 0, 0.0128, 0.0132333, 0.0143, 0.0042, 0.00523333, 0.0059, 1.428, 1.42912, 1.43, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

My current set up uses Sqlalchemy and python-MySQLdb to communicate with the database. 我当前的设置使用Sqlalchemy和python-MySQLdb与数据库进行通信。 I need to find a way to cut down on the data usage, and if possible do it without sending data less frequently 我需要找到一种减少数据使用量的方法,并且如果可能的话,可以减少发送数据的频率

In case it matters Each remote device is a small raspberry pi like computer. 万一重要,每个远程设备都是一台类似树莓派的小型计算机。

For reference this is a link to my development web server where you can see what the point of this project is. 作为参考,这是我的开发Web服务器的链接,您可以在其中查看该项目的目的。 http://104.131.181.35/live/voltsandamps Its still a work in progress. http://104.131.181.35/live/voltsandamps它仍在进行中。

For the past couple of days I have been researching MySQL compression protocol but haven't found a way to implement it with sqlalchemy or any other python database connector. 在过去的几天里,我一直在研究MySQL压缩协议,但是还没有找到使用sqlalchemy或任何其他python数据库连接器实现该协议的方法。

I know that python_mysqldb has a compress flag but I can't figure out how to use it. 我知道python_mysqldb有一个compress标志,但是我不知道如何使用它。 especially while its being used as the driver for sqlalchemy, which i'm willing to drop if necessary. 特别是在将其用作sqlalchemy的驱动程序时,如有必要,我愿意将其删除。

Is this possible? 这可能吗? and if not is there another solution that would work better for this. 如果没有,那么还有另一种解决方案会更好地解决此问题。

Any help would be apreciated. 任何帮助将不胜感激。

EDIT: 编辑:

I ended up writing a web service like what @paidhima recommended. 我最终写了像@paidhima推荐的那样的Web服务。 Each device sends data once every 1-30 minutes. 每个设备每1-30分钟发送一次数据。 The data format is basically a compressed json string, with a version, time-stamp(s), and array(s) of values. 数据格式基本上是压缩的json字符串,具有版本,时间戳和值数组。 The web server then decompresses the received data and inserts it into the DB. 然后,Web服务器对接收到的数据进行解压缩并将其插入到DB中。 When I first asked this question I knew next to nothing about databases and web development in general. 当我第一次问这个问题时,我几乎对数据库和Web开发一无所知。 Its funny to look back in time a year. 回顾一年的时间很有趣。 Final results with the web service: I achieved about an 10:1 compression ratio (between 88 and 92%). Web服务的最终结果:我实现了大约10:1的压缩率(介于88%和92%之间)。 Each device averages about 10 - 15MB per day. 每个设备平均每天大约10-15MB。 After a short time having remote devices connect directly to the server, I began to see that this wasn't a solution suited for anything other than a development environment. 在将远程设备直接连接到服务器的短暂时间之后,我开始发现这不是除开发环境以外的其他解决方案。 Security concerns, firewall issues, too many dropped connections and just general research lead me to drop the direct connection, and write a simple web service. 安全问题,防火墙问题,太多的掉线连接以及只是一般的研究使我放弃了直接连接,并编写了简单的Web服务。

With the direct DB connection, I was able to get it down to about 70-80 MB a day per device. 有了直接的数据库连接,我每天每台设备的连接量可以减少到70-80 MB。 Thats with prepared statements, and connection compression enabled. 那就是准备好的语句,并且启用了连接压缩。

The web service is written in python and without the database interaction code, is only about 250 lines of code (for client and server). Web服务是用python编写的,没有数据库交互代码,仅约250行代码(用于客户端和服务器)。 Thanks for you excellent advice @eggyal and @paidhima. 感谢您的出色建议@eggyal和@paidhima。 I'm completely self taught, and have only been able to get as far as I have thanks to individuals like yourselves that contribute advice and answer questions. 我完全是自学成才,只能感谢像您这样的个人,他们可以提供建议并回答问题。

Identifying Potential Improvements 确定潜在的改进

If you inspect the packets that are transmitted to MySQL upon the insertion of each record—which should be easy to do with a mere packet sniffer , since (unless you're connecting over SSL ) the communications are neither encrypted nor compressed—you will notice: 如果检查插入每条记录后传输到MySQL的数据包 (使用简单的数据包嗅探器应该很容易做到,因为(除非通过SSL进行连接 )通信既未加密也不压缩,因此您会注意到:

  1. The SQL INSERT statement, which includes the full list of column names, is transmitted every time . 每次都会传送包含列名的完整列表的SQL INSERT语句。

  2. The floating-point values are transmitted as strings, requiring up to 36 characters each . 浮点值以字符串形式传输, 每个字符串最多需要36个字符。

Both result in significant unnecessary network utilisation that would be avoided by using MySQL's binary prepared statement protocol instead (the SQL command could be sent to the server only once, and thereafter only data values would be transmitted in their respective storage formats for each insertion attempt). 两者都会导致大量不必要的网络利用率,而可以通过使用MySQL的二进制预准备语句协议来避免(SQL命令只能发送到服务器一次,此后每次尝试插入时仅以其各自的存储格式传输数据值) 。

Of the MySQL drivers that are supported by SQLAlchemy, only Oracle's "official" one ( MySQL Connector/Python ), provides an API for this functionality (whilst oursql also uses the protocol, it doesn't reuse statements that are sent repeatedly). 在SQLAlchemy支持的MySQL驱动程序中,只有Oracle的“官方”驱动程序( MySQL Connector / Python )提供了用于此功能的API(虽然oursql也使用该协议,但它不会重用重复发送的语句)。

SQLAlchemy SQLAlchemy的

Unfortunately, SQLAlchemy's mysqlconnector dialect does not currently utilise these features. 不幸的是,SQLAlchemy的mysqlconnector方言当前未使用这些功能。

Whilst there are still some things you can do in SQLAlchemy to reduce network utilisation (for example, in Core you could prevent the full list of column names being transmitted), the reality is that you won't ever do as well as could be achieved with the binary prepared statement protocol. 尽管您可以在SQLAlchemy中做一些事情来降低网络利用率(例如,在Core中,您可以阻止列名的完整列表被传输),但现实是您做不到的事都无法实现使用二进制准备语句协议。

Recommendations 建议

Therefore I recommend either: 因此,我建议:

  • extending SQLAlchemy's mysqlconnector dialect to support such functionality (more work, but of considerable value to the community at large); 扩展SQLAlchemy的mysqlconnector方言以支持此类功能(需要更多工作,但对整个社区而言具有重大价值); or 要么

  • dropping SQLAlchemy (at least for these insertion operations) and instead using the MySQL Connector/Python driver directly. 删除SQLAlchemy(至少对于这些插入操作而言),而是直接使用MySQL Connector / Python驱动程序。

In pursuing either approach, you could also enable packet compression at the same time. 采用这两种方法时,您也可以同时启用数据包压缩

Example

import time
import mysql.connector

cnx = mysql.connector.connect(user='raspberryPi_1234',
                              password='foobar',
                              host='mysql.example.com',
                              database='voltsandamps',
                              autocommit=true,
                              compress=true)  # compress the connection

cursor = cnx.cursor(prepared=True)            # this is what SQLAlchemy is missing

stmt = "INSERT INTO power_readings VALUES (" + ",".join(126*["?"]) + ")"

while true:
    cursor.execute(stmt, getPowerReadings())
    time.sleep(2)

Final Thoughts 最后的想法

If you need to reduce network utilisation yet further, you might consider using stored procedures to encapsulate your INSERT command—not only because a CALL myProc(...) command will almost always be shorter than the underlying INSERT command, but also because it enables one to do adopt some extremely aggressive techniques including: 如果需要进一步降低网络利用率,则可以考虑使用存储过程来封装INSERT命令-不仅因为CALL myProc(...)命令几乎总是比基础INSERT命令短,而且还因为它启用了要采取一些极端进取的技术,包括:

  • rebasing your data: if values tend to fall within a certain range, you need only transmit the offset from the base of that range (which might permit use of a smaller data type during transmission)—rebasing to the actual value could then be performed within the stored procedure (and the base itself could be set using a user-defined variable ); 重新计算您的数据:如果值倾向于落入某个范围内, 则只需要传输从该范围的底开始的偏移量 (这可能允许在传输过程中使用较小的数据类型)-然后可以在此范围内执行重新计算为实际值存储过程(可以使用用户定义的变量来设置基数本身); and

  • in extremis, one could compress one's data on the client and pack into a binary string, then unpack and decompress at the server—thus maximising the usage of every last bit (also a single string value would incur less management overhead than multiple separate values of the same aggregate length). 在极端情况下,可以将客户端上的数据压缩并打包成二进制字符串,然后在服务器上解压缩并解压缩,从而最大程度地利用每个最后一位(而且,单个字符串值所产生的管理开销要比多个单独的值要少)相同的总长度)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM