简体   繁体   中英

Python: sqlite3 - how to speed up updating of the database

I have a database, which I store as a .db file on my disk. I implemented all the function neccessary for managing this database using sqlite3 . However, I noticed that updating the rows in the table takes a large amount of time. My database has currently 608042 rows. The database has one table - let's call it Table1 . This table consists of the following columns:

id | name | age | address | job | phone | income

( id value is generated automaticaly while a row is inserted to the database). After reading-in all the rows I perform some operations (ML algorithms for predicting the income) on the values from the rows, and next I have to update (for each row) the value of income (thus, for each one from 608042 rows I perform the SQL update operation). In order to update, I'm using the following function (copied from my class):

def update_row(self, new_value, idkey):
    update_query = "UPDATE Table1 SET income = ? WHERE name = ?" % 
    self.cursor.execute(update_query, (new_value, idkey))
    self.db.commit()

And I call this function for each person registered in the database.

for each i out of 608042 rows:
  update_row(new_income_i, i.name)

(values of new_income_i are different for each i). This takes a huge amount of time, even though the dataset is not giant. Is there any way to speed up the updating of the database? Should I use something else than sqlite3 ? Or should I instead of storing the database as a .db file store it in memory (using sqlite3.connect(":memory:") )?

Each UPDATE statement must scan the entire table to find any row(s) that match the name.

An index on the name column would prevent this and make the search much faster. (See Query Planning and How does database indexing work? )

However, if the name column is not unique, then that value is not even suitable to find individual rows: each update with a duplicate name would modify all rows with the same name. So you should use the id column to identify the row to be updated; and as the primary key, this column already has an implicit index.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM