简体   繁体   English

如何使用 PyMySQL 将 Pandas Dataframe 插入 MySql

[英]How to insert a Pandas Dataframe into MySql using PyMySQL

I have got a DataFrame which has got around 30,000+ rows and 150+ columns.我有一个 DataFrame,它有大约 30,000 多行和 150 多列。 So, currently I am using the following code to insert the data into MySQL.所以,目前我正在使用以下代码将数据插入 MySQL。 But since it is reading the rows one at a time, it is taking too much time to insert all the rows into MySql.但由于一次读取一行,将所有行插入 MySql 需要花费太多时间。

Is there any way in which I can insert the rows all at once or in batches?有什么方法可以一次或分批插入所有行? The constraint here is that I need to use only PyMySQL, I cannot install any other library.这里的限制是我只需要使用 PyMySQL,我不能安装任何其他库。

import pymysql
import pandas as pd

# Create dataframe
data = pd.DataFrame({
    'book_id':[12345, 12346, 12347],
    'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
    'price':[29, 23, 27]
})


# Connect to the database
connection = pymysql.connect(host='localhost',
                         user='root',
                         password='12345',
                         db='book')


# create cursor
cursor=connection.cursor()

# creating column list for insertion
cols = "`,`".join([str(i) for i in data.columns.tolist()])

# Insert DataFrame recrds one by one.
for i,row in data.iterrows():
    sql = "INSERT INTO `book_details` (`" +cols + "`) VALUES (" + "%s,"*(len(row)-1) + "%s)"
    cursor.execute(sql, tuple(row))

    # the connection is not autocommitted by default, so we must commit to save our changes
    connection.commit()

# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)

# Fetch all the records
result = cursor.fetchall()
for i in result:
    print(i)

connection.close()

Thank You.谢谢你。

Possible improvements.可能的改进。

  • remove or disable indexes on the table(s)删除或禁用表上的索引
  • Take the commit out of the loop将提交移出循环

Now try and load the data.现在尝试加载数据。

Generate a CSV file and load using ** LOAD DATA INFILE ** - this would be issued from within mysql.生成 CSV 文件并使用 ** LOAD DATA INFILE ** 加载 - 这将从 mysql 中发出。

Try using SQLALCHEMY to create an Engine than you can use later with pandas df.to_sql function.尝试使用 SQLALCHEMY 创建引擎,而不是稍后与 pandas df.to_sql function 一起使用。 This function writes rows from pandas dataframe to SQL database and it is much faster than iterating your DataFrame and using the MySql cursor. This function writes rows from pandas dataframe to SQL database and it is much faster than iterating your DataFrame and using the MySql cursor.

Your code would look something like this:您的代码将如下所示:

import pymysql
import pandas as pd
from sqlalchemy import create_engine

# Create dataframe
data = pd.DataFrame({
    'book_id':[12345, 12346, 12347],
    'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
    'price':[29, 23, 27]
})

db_data = 'mysql+mysqldb://' + 'root' + ':' + '12345' + '@' + 'localhost' + ':3306/' \
       + 'book' + '?charset=utf8mb4'
engine = create_engine(db_data)

# Connect to the database
connection = pymysql.connect(host='localhost',
                         user='root',
                         password='12345',
                         db='book')    

# create cursor
cursor=connection.cursor()
# Execute the to_sql for writting DF into SQL
data.to_sql('book_details', engine, if_exists='append', index=False)    

# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)

# Fetch all the records
result = cursor.fetchall()
for i in result:
    print(i)

engine.dispose()
connection.close()

You can take a look to all the options this function has in pandas doc您可以查看 function 在pandas 文档中的所有选项

It is faster to push a file to the SQL server and let the server manage the input.将文件推送到 SQL 服务器并让服务器管理输入会更快。

So first push the data to a CSV file.所以首先将数据推送到一个 CSV 文件中。

data.to_csv("import-data.csv", header=False, index=False, quoting=2, na_rep="\\N")

And then load it at once into the SQL table.然后立即将其加载到 SQL 表中。

sql = "LOAD DATA LOCAL INFILE \'import-data.csv\' \
    INTO TABLE book_details FIELDS TERMINATED BY \',\' ENCLOSED BY \'\"\' \
    (`" +cols + "`)"
cursor.execute(sql)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM