[英]How to insert a CSV file data into MYSQL using Python efficiently?
我有一個帶有 aprox 的 CSV 輸入文件。 400 萬條記錄。 插入自 +2 小時以來一直在運行,但仍未完成。 數據庫仍然是空的。
關於如何實際插入值(使用insert into
)和更快的任何建議,例如insert into
塊?
我對python很陌生。
43293,cancelled,1,0.0,
1049007,cancelled,1,0.0,
438255,live,1,0.0,classA
1007255,xpto,1,0.0,
def csv_to_DB(xing_csv_input, db_opts):
print("Inserting csv file {} to database {}".format(xing_csv_input, db_opts['host']))
conn = pymysql.connect(**db_opts)
cur = conn.cursor()
try:
with open(xing_csv_input, newline='') as csvfile:
csv_data = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in csv_data:
insert_str = "INSERT INTO table_x (ID, desc, desc_version, val, class) VALUES (%s, %s, %s, %s, %s)"
cur.execute(insert_str, row)
conn.commit()
finally:
conn.close()
更新:感謝所有的投入。 按照建議,我嘗試了一個計數器以批量插入 100 個和較小的 csv 數據集(1000 行)。 現在的問題是只插入了 100 條記錄,盡管計數器多次通過 10 x 100。
代碼更改:
def csv_to_DB(xing_csv_input, db_opts):
print("Inserting csv file {} to database {}".format(xing_csv_input, db_opts['host']))
conn = pymysql.connect(**db_opts)
cur = conn.cursor()
count = 0
try:
with open(xing_csv_input, newline='') as csvfile:
csv_data = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in csv_data:
count += 1
print(count)
insert_str = "INSERT INTO table_x (ID, desc, desc_version, val, class) VALUES (%s, %s, %s, %s, %s)"
if count >= 100:
cur.execute(insert_str, row)
print("count100")
conn.commit()
count = 0
if not row:
cur.execute(insert_str, row)
conn.commit()
finally:
conn.close()
有很多方法可以優化這個插入。 這里有一些想法:
commit()
例子:
對於列表中的數字 2,代碼將具有以下結構:
def csv_to_DB(xing_csv_input, db_opts):
print("Inserting csv file {} to database {}".format(xing_csv_input, db_opts['host']))
conn = pymysql.connect(**db_opts)
cur = conn.cursor()
try:
with open(xing_csv_input, newline='') as csvfile:
csv_data = csv.reader(csvfile, delimiter=',', quotechar='"')
to_insert = []
insert_str = "INSERT INTO table_x (ID, desc, desc_version, val, class) VALUES "
template = '(%s, %s, %s, %s, %s)'
count = 0
for row in csv_data:
count += 1
to_insert.append(tuple(row))
if count % 100 == 0:
query = insert_str + '\n'.join([template % r for r in to_insert])
cur.execute(query)
to_insert = []
conn.commit()
query = insert_str + '\n'.join(template % to_insert)
cur.execute(query)
conn.commit()
finally:
conn.close()
這里。 試試這個片段,讓我知道它是否使用executemany()
。
with open(xing_csv_input, newline='') as csvfile:
csv_data = tuple(csv.reader(csvfile, delimiter=',', quotechar='"'))
csv_data = (row for row in csv_data)
query = "INSERT INTO table_x (ID, desc, desc_version, val, class) VALUES (%s, %s, %s, %s, %s)"
try:
cur.executemany(query, csv_data)
conn.commit()
except:
conn.rollback()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.