[英]Create a temporary table in MySQL using Pandas
Pandas has a great feature, where you can write your dataframe to a table in SQL. Pandas 有一个很棒的功能,您可以在其中将数据帧写入 SQL 表中。
df.to_sql(con=cnx, name='some_table_name', if_exists='replace', flavor='mysql', index=False)
Is there a way to make a temporary table this way?有没有办法以这种方式制作临时表?
There is nothing in the documentation as far as I can tell.据我所知,文档中没有任何内容。
The DataFrame.to_sql()
uses the built into pandaspandas.io.sql
package , which itself relies on the SQLAlchemy as a database abstraction layer. DataFrame.to_sql()
使用内置的 pandaspandas.io.sql
包,它本身依赖于 SQLAlchemy 作为数据库抽象层。 In order to create a "temporary" table in SQLAlchemy ORM, you need to supply a prefix :为了在 SQLAlchemy ORM 中创建“临时”表, 您需要提供一个前缀:
t = Table(
't', metadata,
Column('id', Integer, primary_key=True),
# ...
prefixes=['TEMPORARY'],
)
From what I see, pandas.io.sql
does not allow you to specify the prefixes
or easily change the way tables are created.据我
pandas.io.sql
, pandas.io.sql
不允许您指定prefixes
或轻松更改表的创建方式。
One way to approach this problem would be to create the temporary table beforehand and use to_sql()
with if_exists="append"
(all using the same database connection).解决此问题的一种方法是预先创建临时表并使用带有
if_exists="append"
to_sql()
(都使用相同的数据库连接)。
Here is also what I've tried to do: override the pandas.io.sql.SQLTable
's _create_table_setup()
method and pass the prefixes
to the Table
constructor.这也是我尝试做的:覆盖
pandas.io.sql.SQLTable
的_create_table_setup()
方法并将prefixes
传递给Table
构造函数。 For some reason, the table was still created non-temporary.出于某种原因,该表仍然是非临时创建的。 Not sure if it would help, but here is the code I was using: gist .
不确定它是否会有所帮助,但这是我使用的代码: gist 。 This is kind of hacky, but I hope it would at least serve as an example code to get you started on this approach.
这有点 hacky,但我希望它至少可以作为示例代码,让您开始使用这种方法。
This may be a bit hacky and it doesn't technically create a temporary table, it just acts like one, but you could create use the @contextmanager
decorator from contextlib
to create the table upon opening the context and drop it upon close.这可能有点 hacky,它在技术上并没有创建临时表,它只是像一个临时表一样,但是您可以使用
@contextmanager
装饰器在打开contextlib
时创建表并在关闭时删除它。 Could look something like:可能看起来像:
from contextlib import contextmanager
import numpy as np
import sqlalchemy as sqla
import pandas as pd
@contextmanager
def temp_table(frame, tbl, eng, *args, **kwargs):
frame.to_sql(tbl, eng, *args, **kwargs)
yield
eng.execute('DROP TABLE {}'.format(tbl))
df = pd.DataFrame(np.random.randint(21, size=(10, 10)))
cnx = sqla.create_engine(conn_string)
with temp_table(df, 'some_table_name', cnx, if_exists='replace', flavor='mysql', index=False):
# do stuff with "some_table_name"
I tested it using Teradata and it works fine.我使用 Teradata 对其进行了测试,效果很好。 I don't have a MySQL laying around that I can test it out on, but as long as
DROP
statements work in MySQL, it should work as intended.我没有可以测试的 MySQL,但只要
DROP
语句在 MySQL 中工作,它就应该按预期工作。
This was a quick and easy workaround for me.这对我来说是一种快速简便的解决方法。
Simply apply a RegEx to the generated SQL to add in whatever statements you want.只需将 RegEx 应用于生成的 SQL 即可添加您想要的任何语句。
import io
import pandas as pd
# Get the SQL that would be generated by the create table statement
create_table_sql = pd.io.sql.get_schema(df, tmp_table_name)
# Replace the `CREATE TABLE` part of the generated statement with
# whatever you need.
create_tmp_table_sql = re.sub(
"^(CREATE TABLE)?",
"CREATE TEMP TABLE",
create_table_sql
)
# Write to the database in a transaction (psychopg2)
with conn.cursor() as cur:
cur.execute(create_tmp_table_sql)
output = io.StringIO()
df.to_csv(output, sep="\t", header=False, index=False, na_rep="NULL")
output.seek(0)
cur.copy_from(output, tmp_table_name, null="NULL")
Credit to Aseem for a fast way to write to Postgres.感谢 Aseem提供了一种快速写入 Postgres 的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.