I'm developing a website where users import csv files directly to a database and a front end that performs some data analytics on the data once it has been filed in the database. I'm using pandas to convert the csv to a dataframe and to subsequently import that dataframe into the MySQL database:
Import to MySQL database:
engine = create_engine('mysql+mysqlconnector://[username]:[password]@[host]:[port]/[schema]', echo=False)
df = pd.read_csv('C:/Users/[user]/Documents/Sales_Records.csv')
df.to_sql(con= engine, name='data', if_exists='replace')
The problem with this is that for the datasets I work with (5 million rows), the performance is too slow and the action times out without importing the data. However, if I try the same thing except using SQLite3:
import to SQLite3 database:
conn = sqlite3.connect('customer.db')
df = pd.read_csv('C:/Users/[user]/Documents/Sales_Records.csv')
df.to_sql('Sales', conn, if_exists='append', index=False)
mycursor = conn.cursor()
query = 'SELECT * FROM Sales LIMIT 10'
print(mycursor.execute(query).fetchall())
This block of code executes in seconds and imports all 5 million rows of the dataset. So what should I do? I do not anticipate multiple people passing in large datasets all at the same time so I suppose it would not hurt to just ditch MySQL for the clear performance advantages provided by SQLite in this application. It just feels like there's a better way though...
MySQL sends the data to a disk over a network connection.
SQLite3 send the data over a disk directly.
Look at https://gist.github.com/jboner/2841832
You did not mention where the MySQL server is. But even if it was on your local machine, it will pass through a TCP/IP stack whereas SQLite will just write directly to disk.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.