简体   繁体   English

sqlalchemy批量插入比构建原始SQL慢

[英]sqlalchemy bulk insert is slower than building raw SQL

I'm going through this article on the sqlalchemy bulk insert performance. 我将通过这篇文章对SQLAlchemy的批量插入性能。 I tried various approaches specified in the benchmark test - SQLAlchemy ORM bulk_insert_mappings() , SQLAlchemy Core . 我尝试了基准测试中指定的各种方法SQLAlchemy ORM bulk_insert_mappings()SQLAlchemy Core Unfortunately for inserting 1000 rows all these methods required about 1min to insert them. 不幸的是,要插入1000行,所有这些方法都需要大约1分钟的时间来插入它们。 This is horrendously slow. 这太慢了。 I tried also the approach specified here - this requires me building a large SQL statement like: 我也尝试了此处指定的方法-这需要我构建一个大型SQL语句,例如:

INSERT INTO mytable (col1, col2, col3)
VALUES (1,2,3), (4,5,6) ..... --- up to 1000 of these

And the insert for this raw SQL is something like: 这个原始SQL的插入是这样的:

MySession.execute('''
insert into MyTable (e, l, a)
values {}
'''.format(",".join(my_insert_str)))

Using this approach I improved the performance 50x+ times to 10000 insertions in 10-11 seconds. 使用这种方法,我在10-11秒内将性能提高了50倍以上,达到了10000次插入。

Here is the code for the approach using the build-in lib. 这是使用内置库的方法的代码。

class MyClass(Base):
    __tablename__ = "MyTable"
    e = Column(String(256), primary_key=True)
    l = Column(String(6))
    a = Column(String(20), primary_key=True)

    def __repr__(self):
        return self.e + " " + self.a+ " " + self.l

....... .......

        dict_list = []
        for i, row in chunk.iterrows():

            dict_list += [{"e" : row["e"], "l" : l, "a" : a}]

        MySession.execute(
            Myclass.__table__.insert(),
            dict_list
        )

Here is how I connect to the database. 这是我连接数据库的方式。

    params = urllib.quote_plus("DRIVER={SQL Server Native Client 10.0};SERVER=servername;DATABASE=dbname;UID=user;PWD=pass")
    engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params )
    MySession.configure(bind=engine, autoflush=False, expire_on_commit=False)

Is there an issue with my set up to degrade the performance so much? 我的设置是否存在使性能大大降低的问题? I tried with different db drivers - pyodbc and pymssql. 我尝试使用其他数据库驱动程序-pyodbc和pymssql。 What ever I try I cannot any close to the numbers they claim in the article namely: 无论我尝试什么,我都无法接近他们在文章中声称的数字:

SQLAlchemy ORM: Total time for 100000 records 2.192882061 secs
SQLAlchemy ORM pk given: Total time for 100000 records 1.41679310799 secs
SQLAlchemy ORM bulk_save_objects(): Total time for 100000 records 0.494568824768 secs
SQLAlchemy ORM bulk_insert_mappings(): Total time for 100000 records 0.325763940811 secs
SQLAlchemy Core: Total time for 100000 records 0.239127874374 secs
sqlite3: Total time for 100000 records 0.124729156494 sec

I'm connecting to MS SQL Server 2008. Let me know if I've missed any other details. 我正在连接MS SQL Server2008。如果我错过任何其他详细信息,请告诉我。

The problem with the raw SQL approach is that it's not SQL injection safe. 原始SQL方法的问题在于它不是SQL注入安全的。 So alternatively if you have suggestions how to solve this issue it will be also very helpful :). 因此,或者,如果您对如何解决此问题有任何建议,它也会非常有用:)。

You're doing 你在做

MySession.execute(
    Myclass.__table__.insert(),
    dict_list
)

which uses executemany() . 使用executemany() It is not the same as INSERT INTO ... VALUES ... . 它与INSERT INTO ... VALUES ... To use VALUES , do: 要使用VALUES ,请执行以下操作:

MySession.execute(
    Myclass.__table__.insert().values(dict_list)
)

As a side note, the SQL injection problem is solved using parameters: 附带说明,SQL注入问题使用参数解决:

MySession.execute('''
insert into MyTable (e, l, a)
values (?, ?, ?), (?, ?, ?), ...
''', params)

The takeaway here is that you're not comparing equivalent constructs. 这里的要点是您没有比较等效的构造。 You're not using VALUES in the SQLAlchemy-generated query but you are in your textual SQL, and you're not using parameterization in your textual SQL but you are in the SQLAlchemy-generated query. 您没有在SQLAlchemy生成的查询中使用VALUES ,但是在文本SQL中,并且在文本SQL中未使用参数化,但是在SQLAlchemy生成的查询中。 If you turn on logging for the executed SQL statements you'll see exactly what is different. 如果为执行的SQL语句打开日志记录,您将看到完全不同的地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM