简体   繁体   English

将pandas DataFrame转储到SQL语句

[英]Dump pandas DataFrame to SQL statements

I need to convert pandas DataFrame object to a series of SQL statements that reproduce the object. 我需要将pandas DataFrame对象转换为一系列重现该对象的SQL语句。

For example, suppose I have a DataFrame object: 例如,假设我有一个DataFrame对象:

>>> df = pd.DataFrame({'manufacturer': ['Audi', 'Volkswagen', 'BMW'], 
                       'model': ['A3', 'Touareg', 'X5']})
>>> df
  manufacturer    model
0         Audi       A3
1   Volkswagen  Touareg
2          BMW       X5

I need to convert it to the following SQL representation (not exactly the same): 我需要将其转换为以下SQL表示形式(不完全相同):

CREATE TABLE "Auto" (
"index" INTEGER,
  "manufacturer" TEXT,
  "model" TEXT
);
INSERT INTO Auto (manufacturer, model) VALUES ('Audi', 'A3'), ('Volkswagen', 'Touareg'), ('BMW', 'X5');

Luckily, pandas DataFrame object has to_sql() method which allows dumping the whole DataFrame to a database through SQLAlchemy engine. 幸运的是,pandas DataFrame对象具有to_sql()方法,该方法允许通过SQLAlchemy引擎将整个DataFrame转储到数据库中。 I decided to use SQLite in-memory database for this: 我决定为此使用SQLite内存数据库:

>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite://', echo=False)  # Turning echo to True just logs SQL statements, I'd avoid parsing this logs
>>> df.to_sql(name='Auto', con=engine)

I'm stuck at this moment. 我现在被困住了。 I can't dump SQLite in-memory database to SQL statements either I can't find sqlalchemy driver that would dump SQL statements into a file instead of executing them. 我无法将SQLite内存数据库转储到SQL语句中,也找不到可以将SQL语句转储到文件中而不执行它们的sqlalchemy驱动程序。

Is there a way to dump all queries sent to SQLAlchemy engine as SQL statements to a file? 有没有一种方法可以将所有发送到SQLAlchemy引擎的查询作为SQL语句转储到文件中?

My not elegant solution so far: 到目前为止,我的解决方案还不够完善:

>>> from sqlalchemy import MetaData
>>> meta = MetaData()
>>> meta.reflect(bind=engine)
>>> print(pd.io.sql.get_schema(df, name='Auto') + ';')
CREATE TABLE "Auto" (
"manufacturer" TEXT,
  "model" TEXT
);
>>> print('INSERT INTO Auto ({}) VALUES\n{};'.format(', '.join([repr(c) for c in df.columns]), ',\n'.join([str(row[1:]) for row in engine.execute(meta.tables['Auto'].select())])))
INSERT INTO Auto ('manufacturer', 'model') VALUES
('Audi', 'A3'),
('Volkswagen', 'Touareg'),
('BMW', 'X5');

I would actually prefer a solution that does not require building the SQL statements manually. 我实际上更喜欢不需要手动构建SQL语句的解决方案。

SQLite actually allows one to dump the whole database to a series of SQL statements with dump command . SQLite实际上允许使用dump命令将整个数据库转储为一系列SQL语句。 This functionality is also available in python DB-API interface for SQLite: sqlite3, specifically, through connection object's iterdump() method . 对于SQLite:sqlite3,也可以通过连接对象的iterdump()方法在python DB-API接口中使用此功能。 As far as I know, SQLAlchemy does not provide this functionality. 据我所知,SQLAlchemy不提供此功能。

Thus, to dump pandas DataFrame to a series of SQL statements one needs to first dump it to in-memory SQLite database, and then dump this database using iterdump() method: 因此,要将pandas DataFrame转储到一系列SQL语句中,首先需要将其转储到内存中的SQLite数据库中,然后使用iterdump()方法转储该数据库:

from sqlalchemy import create_engine    

engine = create_engine('sqlite://', echo=False)
df.reset_index().to_sql(name=table_name, con=engine)  # reset_index() is needed to preserve index column in dumped data

with engine.connect() as conn:
    for line in conn.connection.iterdump():
        stream.write(line)
        stream.write('\n')

engine().connect().connection allows to get raw DBAPI connection . engine().connect().connection允许获取原始DBAPI连接

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM