简体   繁体   English

Python Pandas MySQL - 为什么在将 SQLite 写入数据库时如此之快

[英]Python Pandas MySQL - Why is SQLite so much faster when writing dataframes to a database

I'm developing a website where users import csv files directly to a database and a front end that performs some data analytics on the data once it has been filed in the database.我正在开发一个网站,用户在其中将 csv 文件直接导入数据库和前端,一旦数据被归档到数据库中,它就会对数据执行一些数据分析。 I'm using pandas to convert the csv to a dataframe and to subsequently import that dataframe into the MySQL database: I'm using pandas to convert the csv to a dataframe and to subsequently import that dataframe into the MySQL database:

Import to MySQL database:导入 MySQL 数据库:

engine = create_engine('mysql+mysqlconnector://[username]:[password]@[host]:[port]/[schema]', echo=False)
df = pd.read_csv('C:/Users/[user]/Documents/Sales_Records.csv')
df.to_sql(con= engine, name='data', if_exists='replace')

The problem with this is that for the datasets I work with (5 million rows), the performance is too slow and the action times out without importing the data.这样做的问题是,对于我使用的数据集(500 万行),性能太慢并且操作超时而没有导入数据。 However, if I try the same thing except using SQLite3:但是,如果我尝试除使用 SQLite3 之外的相同操作:

import to SQLite3 database:导入 SQLite3 数据库:

conn = sqlite3.connect('customer.db')
df = pd.read_csv('C:/Users/[user]/Documents/Sales_Records.csv')
df.to_sql('Sales', conn, if_exists='append', index=False)
mycursor = conn.cursor()
query = 'SELECT * FROM Sales LIMIT 10'
print(mycursor.execute(query).fetchall())

This block of code executes in seconds and imports all 5 million rows of the dataset.此代码块在几秒钟内执行并导入数据集的所有 500 万行。 So what should I do?所以我该怎么做? I do not anticipate multiple people passing in large datasets all at the same time so I suppose it would not hurt to just ditch MySQL for the clear performance advantages provided by SQLite in this application.我预计不会有多人同时传入大型数据集,所以我认为放弃 MySQL 并不会因为 SQLite 在此应用程序中提供的明显性能优势而受到伤害。 It just feels like there's a better way though...只是感觉有更好的方法......

MySQL sends the data to a disk over a network connection. MySQL 通过网络连接将数据发送到磁盘。

SQLite3 send the data over a disk directly. SQLite3 直接通过磁盘发送数据。

Look at https://gist.github.com/jboner/2841832看看https://gist.github.com/jboner/2841832

You did not mention where the MySQL server is.您没有提到 MySQL 服务器在哪里。 But even if it was on your local machine, it will pass through a TCP/IP stack whereas SQLite will just write directly to disk.但即使它在您的本地计算机上,它也会通过 TCP/IP 堆栈,而 SQLite 将直接写入磁盘。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM