简体   繁体   English


[英]Does the number of columns affect the speed of sqlalchemy?

I created two tables with sqlalchemy (python 2.7), the database is mysql 5.5. 我用sqlalchemy(python 2.7)创建了两个表,数据库是mysql 5.5。 The following is my code: 以下是我的代码:

engine = create_engine('mysql://root:123@localhost/test')

metadata = MetaData()

conn = engin.connect()

# For table 1:

columns = []

for i in xrange(100):

    columns.append(Column('c%d' % i, TINYINT, nullable = False, server_default = '0'))
    columns.append(Column('d%d' % i, SmallInteger, nullable = False, server_default = '0'))

user = Table('user', metadata, *columns)
# So user has 100 tinyint columns and 100 smallint columns.

# For table 2:

user2 = Table('user2', metadata,

        Column('c0', BINARY(100), nullable = False, server_default='\0'*100),
        Column('d0', BINARY(200), nullable = False, server_default='\0'*200),

# user2 has two columns contains 100 bytes and 200 bytes respectively. 

I then inserted 4000 rows into each table. Since these two tables have same row length, I
expect the select speed will be almost the same. I ran the following test code:

s1 = select([user]).compile(engine)

s2 = select([user2]).compile(engine)

t1 = time()

result = conn.execute(s1).fetchall()

print 't1:', time() - t1 

t2 = time()

result = conn.execute(s2).fetchall()

print 't2', time() - t2 

The result is :

t1: 0.5120000

t2: 0.0149999

Does this means the number of columns in table will dramatically affect the performance of SQLAlchemy? 这是否意味着表中的列数会显着影响SQLAlchemy的性能? Thank you in advance! 先感谢您!

Does this means the number of columns in table will dramatically affect the performance of SQLAlchemy? 这是否意味着表中的列数会显着影响SQLAlchemy的性能?

well thats a tough one, and it probably depends more on the underlying SQL engine, MySQL in this case, then actually sqlalchemy , which is nothing more than a way to interact with different db engines while using the same interface. 这是一个艰难的,它可能更多地取决于底层SQL引擎,在这种情况下是MySQL ,然后实际上是sqlalchemy ,这只不过是一种在使用相同界面时与不同数据库引擎交互的方式。

SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL. SQLAlchemy是Python SQL工具包和Object Relational Mapper,它为应用程序开发人员提供了SQL的全部功能和灵活性。

It provides a full suite of well known enterprise-level persistence patterns, designed for efficient and high-performing database access, adapted into a simple and Pythonic domain language. 它提供了一整套众所周知的企业级持久性模式,旨在实现高效,高性能的数据库访问,并采用简单的Pythonic域语言。

Though I could be wrong, you could try benchmarking it using regular SQL . 虽然我可能错了,但您可以尝试使用常规SQL对其进行基准测试。

I actually ran some tests ... 我实际上做了一些测试......

import timeit

setup = """
from sqlalchemy import create_engine, MetaData, select, Table, Column
from sqlalchemy.dialects.sqlite import BOOLEAN, SMALLINT, VARCHAR
engine = create_engine('sqlite://', echo = False)
metadata = MetaData()
conn = engine.connect()
columns = []

for i in xrange(100):
    columns.append(Column('c%d' % i, VARCHAR(1), nullable = False, server_default = '0'))
    columns.append(Column('d%d' % i, VARCHAR(2), nullable = False, server_default = '00'))  

user = Table('user', metadata, *columns)
conn.execute(user.insert(), [{}] * 4000)

user2 = Table('user2', metadata, Column('c0', VARCHAR(100), nullable = False, server_default = '0' * 100),  \
                                 Column('d0', VARCHAR(200), nullable = False, server_default = '0' * 200))
conn.execute(user2.insert(), [{}] * 4000)

many_columns = """
s1 = select([user]).compile(engine)
result = conn.execute(s1).fetchall()

two_columns = """
s2 = select([user2]).compile(engine)
result = conn.execute(s2).fetchall()

raw_many_columns = "res = conn.execute('SELECT * FROM user').fetchall()"
raw_two_columns = "res = conn.execute('SELECT * FROM user2').fetchall()"

timeit.Timer(two_columns, setup).timeit(number = 1)
timeit.Timer(raw_two_columns, setup).timeit(number = 1)
timeit.Timer(many_columns, setup).timeit(number = 1)
timeit.Timer(raw_many_columns, setup).timeit(number = 1)

>>> timeit.Timer(two_columns, setup).timeit(number = 1)
>>> timeit.Timer(raw_two_columns, setup).timeit(number = 1)
>>> timeit.Timer(many_columns, setup).timeit(number = 1)
>>> timeit.Timer(raw_many_columns, setup).timeit(number = 1)

I did find this: 我确实发现了这个:
http://www.mysqlperformanceblog.com/2009/09/28/how-number-of-columns-affects-performance/ http://www.mysqlperformanceblog.com/2009/09/28/how-number-of-columns-affects-performance/

which was kind of interesting though he used max for testing ... 尽管他使用max进行测试,但这有点有趣......

I really do love sqlalchemy, so I decided to compare it using pythons own sqlite3 module 我真的很喜欢sqlalchemy,所以我决定使用pythons自己的sqlite3模块进行比较

import timeit
setup = """
import sqlite3
conn = sqlite3.connect(':memory:')
c = conn.cursor()

c.execute('CREATE TABLE user (%s)' %\
          ("".join(("c%i VARCHAR(1) DEFAULT '0' NOT NULL, d%i VARCHAR(2) DEFAULT '00' NOT NULL," % (index, index) for index in xrange(99))) +\
           "c99 VARCHAR(1) DEFAULT '0' NOT NULL, d99 VARCHAR(2) DEFAULT '0' NOT NULL"))

c.execute("CREATE TABLE user2 (c0 VARCHAR(100) DEFAULT '%s' NOT NULL, d0 VARCHAR(200) DEFAULT '%s' NOT NULL)" % ('0'* 100, '0'*200))

c.executemany('INSERT INTO user VALUES (%s)' % ('?,' * 199 + '?'), [('0',) * 200] * 4000)
c.executemany('INSERT INTO user2 VALUES (?,?)', [('0'*100, '0'*200)] * 4000)

many_columns = """
r = c.execute('SELECT * FROM user')
all = r.fetchall()

two_columns = """
r2 = c.execute('SELECT * FROM user2')
all = r2.fetchall()

timeit.Timer(many_columns, setup).timeit(number = 1)
timeit.Timer(two_columns, setup).timeit(number = 1)

>>> timeit.Timer(many_columns, setup).timeit(number = 1)
>>> timeit.Timer(two_columns, setup).timeit(number = 1)

and came up with the same result, so I really do think its a database implementation not a sqlalchemy issue. 并得出了相同的结果,所以我确实认为它的数据库实现不是sqlalchemy问题。


import timeit

setup = """
from sqlalchemy import create_engine, MetaData, select, Table, Column
from sqlalchemy.dialects.sqlite import BOOLEAN, SMALLINT, VARCHAR
engine = create_engine('sqlite://', echo = False)
metadata = MetaData()
conn = engine.connect()
columns = []

for i in xrange(100):
    columns.append(Column('c%d' % i, VARCHAR(1), nullable = False, server_default = '0'))
    columns.append(Column('d%d' % i, VARCHAR(2), nullable = False, server_default = '00'))

user = Table('user', metadata, *columns)

user2 = Table('user2', metadata, Column('c0', VARCHAR(100), nullable = False, server_default = '0' * 100),  \
                                 Column('d0', VARCHAR(200), nullable = False, server_default = '0' * 200))

many_columns = """
conn.execute(user.insert(), [{}] * 4000)

two_columns = """
conn.execute(user2.insert(), [{}] * 4000)

>>> timeit.Timer(two_columns, setup).timeit(number = 1)
>>> timeit.Timer(many_columns, setup).timeit(number = 1)

testing with sqlite3 module. 用sqlite3模块测试。

import timeit
setup = """
import sqlite3
conn = sqlite3.connect(':memory:')
c = conn.cursor()

c.execute('CREATE TABLE user (%s)' %\
    ("".join(("c%i VARCHAR(1) DEFAULT '0' NOT NULL, d%i VARCHAR(2) DEFAULT '00' NOT NULL," % (index, index) for index in xrange(99))) +\
            "c99 VARCHAR(1) DEFAULT '0' NOT NULL, d99 VARCHAR(2) DEFAULT '0' NOT NULL"))

c.execute("CREATE TABLE user2 (c0 VARCHAR(100) DEFAULT '%s' NOT NULL, d0 VARCHAR(200) DEFAULT '%s' NOT NULL)" % ('0'* 100, '0'*200))

many_columns = """
c.executemany('INSERT INTO user VALUES (%s)' % ('?,' * 199 + '?'), [('0', '00') * 100] * 4000)

two_columns = """
c.executemany('INSERT INTO user2 VALUES (?,?)', [('0'*100, '0'*200)] * 4000)

timeit.Timer(many_columns, setup).timeit(number = 1)
timeit.Timer(two_columns, setup).timeit(number = 1)

>>> timeit.Timer(many_columns, setup).timeit(number = 1)
>>> timeit.Timer(two_columns, setup).timeit(number = 1)

Samy.vilar's answer is excellent. Samy.vilar的答案非常好。 But one key thing to remember is that the number of columns will have an impact on the performance of any database and any ORM. 但要记住的一个关键事项是列数会对任何数据库和任何ORM的性能产生影响。 The more columns you have, the more data is being accessed from disk and transferred. 您拥有的列越多,从磁盘访问和传输的数据就越多。

Also, depending on the query and table structures, adding more columns could change a query from being covered by an index to being forced to access the base table, which can add substantial time under certain databases and certain circumstances. 此外,根据查询和表结构,添加更多列可能会将查询从索引覆盖变为强制访问基表,这可能会在某些数据库和某些情况下增加大量时间。

I have only played with SQLAlchemy a little bit, but as a DBA I generally advise the developers I work with to only query the columns that they will need and to avoid use of "select *" in production code, both because it is likely to contain more columns than needed and because it makes the code more brittle in the face of potential columns being added to the table/view. 我只和SQLAlchemy玩过一点,但作为一名DBA,我通常建议我使用的开发人员只查询他们需要的列,并避免在生产代码中使用“select *”,因为它很可能包含的列数多于所需的列数,因为它会使代码在添加到表/视图中的潜在列时更加脆弱。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM