[英]Read a csv and insert to database performance
I have a mission to read a csv file line by line and insert them to database. 我的任务是逐行读取csv文件并将其插入数据库。
And the csv file contains about 1.7 million lines. csv文件包含约170万行。
I use python with sqlalchemy orm(merge function) to do this. 我将python与sqlalchemy orm(合并功能)配合使用。 But it spend over five hours. 但是它花费了五个多小时。
Is it caused by python slow performance or sqlalchemy or sqlalchemy? 是由python性能下降还是sqlalchemy或sqlalchemy引起的?
or what if i use golang to do it to make a obvious better performance?(but i have no experience on go. Besides, this job need to be scheduled every month) 还是如果我使用golang来使性能显着提高怎么办?(但我没有使用经验。此外,这项工作需要每月安排)
Hope you guy giving any suggestion, thanks! 希望您能提出任何建议,谢谢!
Update: database - mysql 更新:数据库-MySQL
For such a mission you don't want to insert data line by line :) Basically, you have 2 ways: 对于这样的任务,您不想 逐行插入数据:)基本上,您有两种方法:
INSERT
query ( How to do a batch insert in MySQL ) instead. 改用BATCH INSERT
查询( 如何在MySQL中进行批量插入 )。 LOAD DATA [LOCAL] INFILE
as suggested above. 按需要的方式处理数据,然后将其输出到一些临时CSV文件中,然后按照上面的建议运行LOAD DATA [LOCAL] INFILE
。 If you don't need to preprocess you data, just feed the CSV to the database (I assume it's MySQL) 如果您不需要预处理数据,只需将CSV馈入数据库(我假设它是MySQL) Follow below three steps 请遵循以下三个步骤
PYTHON CODE : 密码 :
import numpy as np
import pandas as pd
from mysql.connector import connect
csv_file = 'dbtable_name.csv'
df = pd.read_csv(csv_file)
table_name = csv_file.split('.')
query = "CREATE TABLE " + table_name[0] + "( \n"
for count in np.arange(df.columns.values.size):
query += df.columns.values[count]
if df.dtypes[count] == 'int64':
query += "\t\t int(11) NOT NULL"
elif df.dtypes[count] == 'object':
query += "\t\t varchar(64) NOT NULL"
elif df.dtypes[count] == 'float64':
query += "\t\t float(10,2) NOT NULL"
if count == 0:
query += " PRIMARY KEY"
if count < df.columns.values.size - 1:
query += ",\n"
query += " );"
#print(query)
database = connect(host='localhost', # your host
user='username', # username
passwd='password', # password
db='dbname') #dbname
curs = database.cursor(dictionary=True)
curs.execute(query)
# print(query)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.