简体   繁体   中英

How to insert a Pandas Dataframe into MySql using PyMySQL

I have got a DataFrame which has got around 30,000+ rows and 150+ columns. So, currently I am using the following code to insert the data into MySQL. But since it is reading the rows one at a time, it is taking too much time to insert all the rows into MySql.

Is there any way in which I can insert the rows all at once or in batches? The constraint here is that I need to use only PyMySQL, I cannot install any other library.

import pymysql
import pandas as pd

# Create dataframe
data = pd.DataFrame({
    'book_id':[12345, 12346, 12347],
    'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
    'price':[29, 23, 27]
})


# Connect to the database
connection = pymysql.connect(host='localhost',
                         user='root',
                         password='12345',
                         db='book')


# create cursor
cursor=connection.cursor()

# creating column list for insertion
cols = "`,`".join([str(i) for i in data.columns.tolist()])

# Insert DataFrame recrds one by one.
for i,row in data.iterrows():
    sql = "INSERT INTO `book_details` (`" +cols + "`) VALUES (" + "%s,"*(len(row)-1) + "%s)"
    cursor.execute(sql, tuple(row))

    # the connection is not autocommitted by default, so we must commit to save our changes
    connection.commit()

# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)

# Fetch all the records
result = cursor.fetchall()
for i in result:
    print(i)

connection.close()

Thank You.

Possible improvements.

  • remove or disable indexes on the table(s)
  • Take the commit out of the loop

Now try and load the data.

Generate a CSV file and load using ** LOAD DATA INFILE ** - this would be issued from within mysql.

Try using SQLALCHEMY to create an Engine than you can use later with pandas df.to_sql function. This function writes rows from pandas dataframe to SQL database and it is much faster than iterating your DataFrame and using the MySql cursor.

Your code would look something like this:

import pymysql
import pandas as pd
from sqlalchemy import create_engine

# Create dataframe
data = pd.DataFrame({
    'book_id':[12345, 12346, 12347],
    'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
    'price':[29, 23, 27]
})

db_data = 'mysql+mysqldb://' + 'root' + ':' + '12345' + '@' + 'localhost' + ':3306/' \
       + 'book' + '?charset=utf8mb4'
engine = create_engine(db_data)

# Connect to the database
connection = pymysql.connect(host='localhost',
                         user='root',
                         password='12345',
                         db='book')    

# create cursor
cursor=connection.cursor()
# Execute the to_sql for writting DF into SQL
data.to_sql('book_details', engine, if_exists='append', index=False)    

# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)

# Fetch all the records
result = cursor.fetchall()
for i in result:
    print(i)

engine.dispose()
connection.close()

You can take a look to all the options this function has in pandas doc

It is faster to push a file to the SQL server and let the server manage the input.

So first push the data to a CSV file.

data.to_csv("import-data.csv", header=False, index=False, quoting=2, na_rep="\\N")

And then load it at once into the SQL table.

sql = "LOAD DATA LOCAL INFILE \'import-data.csv\' \
    INTO TABLE book_details FIELDS TERMINATED BY \',\' ENCLOSED BY \'\"\' \
    (`" +cols + "`)"
cursor.execute(sql)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM