I have a mission to read a csv file line by line and insert them to database.
And the csv file contains about 1.7 million lines.
I use python with sqlalchemy orm(merge function) to do this. But it spend over five hours.
Is it caused by python slow performance or sqlalchemy or sqlalchemy?
or what if i use golang to do it to make a obvious better performance?(but i have no experience on go. Besides, this job need to be scheduled every month)
Hope you guy giving any suggestion, thanks!
Update: database - mysql
For such a mission you don't want to insert data line by line :) Basically, you have 2 ways:
INSERT
query ( How to do a batch insert in MySQL ) instead. LOAD DATA [LOCAL] INFILE
as suggested above. If you don't need to preprocess you data, just feed the CSV to the database (I assume it's MySQL) Follow below three steps
PYTHON CODE :
import numpy as np
import pandas as pd
from mysql.connector import connect
csv_file = 'dbtable_name.csv'
df = pd.read_csv(csv_file)
table_name = csv_file.split('.')
query = "CREATE TABLE " + table_name[0] + "( \n"
for count in np.arange(df.columns.values.size):
query += df.columns.values[count]
if df.dtypes[count] == 'int64':
query += "\t\t int(11) NOT NULL"
elif df.dtypes[count] == 'object':
query += "\t\t varchar(64) NOT NULL"
elif df.dtypes[count] == 'float64':
query += "\t\t float(10,2) NOT NULL"
if count == 0:
query += " PRIMARY KEY"
if count < df.columns.values.size - 1:
query += ",\n"
query += " );"
#print(query)
database = connect(host='localhost', # your host
user='username', # username
passwd='password', # password
db='dbname') #dbname
curs = database.cursor(dictionary=True)
curs.execute(query)
# print(query)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.