简体   繁体   中英

Step through CSV file incrementally in Python

I am trying to speed up loading a large CSV file into a MySQL database. Using this code it takes about 4 hours to load a 4GB file:

with open(source) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    next(csv_reader)
    insert_sql = """ INSERT INTO billing_info_test (InvoiceId, PayerAccountId, LinkedAccountId) VALUES (%s, %s, %s) """
    for row in csv_reader:
        cursor.execute(insert_sql,row)
        print(cursor.rowcount, 'inserted with LinkedAccountId', row[2], 'at', datetime.now().isoformat())
    print("Committing the DB")
    mydb.commit(
cursor.close()
mydb.close()

I want to use the executemany() statement to make this faster. For that, you have to pass a list of tuples to the second argument.

If I build the list on each row iteration it gets too large, and I get out of memory errors when the list gets too large, and the script crashes.

I am not able to get a length of csv_reader or csv_file to use in a range statement.

How can I loop through the CSV file 1000 rows at a time and store the result in a list, use it in executemany, then store the next 1000 rows, etc until the end of the CSV file?

如果您需要在mysql中进行高速插入,可以尝试使用:

LOAD DATA LOCAL INFILE '/path/to/my_file.csv' INTO TABLE my_table;

A small hint:

In [1]: import itertools

In [2]: rows = iter(range(10))

In [3]: while True:
   ...:     batch = [*itertools.islice(rows, 3)]
   ...:     if not batch:
   ...:         break
   ...:     print(batch)
   ...:
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

But I should agree with @heliosk that a better solution is to use LOAD DATA INFILE for large files. You may also need to disable keys until the import is finished.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM