简体   繁体   中英

efficient way to store large amount of html content

I have a database, which has a table with fields like "title, album, artist..." and it also has many fields with html content for every record (up to 30).

Problem is, that this database has tens of thousands of records and is hundreds of megabytes large because of the html content. Because of the size of the sqlite file the search is very slow (also inserting new elements in a transaction is very slow ~10-30 second for 200 new rows). The very first LIKE query can take 10-15 seconds, other searches are fast enough (indices are created and work ok). When I removed the html content from the database the search was always instant.

So the question is, what is the best way to store that additional html content? Right now I play with the option to store it in separate files, but it can generate up to 600k files and more in the future, which is quiet slow to create. Storing the files in a zip archive will probably hit its file number limit. Other options are to zip files per table row, store the html in a separate table in the same database, or to create a separate database file for the html content.

What will give me the best performance? Or are there other better options? I need quick insert, update and serach.

There are a couple different things you could consider doing:

  • Split the data into separate tables. You could then have 1:1 mappings between the tables, and only join them in if necessary, speeding up queries without them.
  • Check your indexes. Just because you have them and you think they're working, doesn't mean they are. If I recall correctly, sqlite will use at most one index per query, so you need to make sure you have the best index possible available for the queries you're using. The ANALYZE command can help with that.

After some days of experimenting I came to this conclusion:

  • one database file with one table was the slowest (up to 10 seconds)
  • one database with two tables was twice as fast in the worst case scenario as one table
  • fastest was to have two separate database files. one with data needed for search and the other for the huge html data. this is almost instant in the worst case ~300ms and in normal usage it is instant

So I reccommend to use two separate database files in this scenario. If someone does not come with a faster/better solution I will accept this as the answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM