简体   繁体   English

存储大量html内容的有效方法

[英]efficient way to store large amount of html content

I have a database, which has a table with fields like "title, album, artist..." and it also has many fields with html content for every record (up to 30). 我有一个数据库,其中有一个表,其中包含“标题,专辑,艺术家...”之类的字段,并且每个记录中都有很多带有html内容的字段(最多30个)。

Problem is, that this database has tens of thousands of records and is hundreds of megabytes large because of the html content. 问题在于,该数据库具有成千上万的记录,并且由于html内容而具有数百兆的大小。 Because of the size of the sqlite file the search is very slow (also inserting new elements in a transaction is very slow ~10-30 second for 200 new rows). 由于sqlite文件的大小,搜索非常慢(对于200个新行,在事务中插入新元素也很慢〜10-30秒)。 The very first LIKE query can take 10-15 seconds, other searches are fast enough (indices are created and work ok). 第一个LIKE查询可能需要10到15秒的时间,其他搜索的速度足够快(创建索引并可以正常运行)。 When I removed the html content from the database the search was always instant. 当我从数据库中删除html内容时,搜索始终是即时的。

So the question is, what is the best way to store that additional html content? 所以问题是,存储这些额外的html内容的最佳方法是什么? Right now I play with the option to store it in separate files, but it can generate up to 600k files and more in the future, which is quiet slow to create. 现在,我可以选择将其存储在单独的文件中,但将来它最多可以生成600k文件,甚至更多,这创建起来非常缓慢。 Storing the files in a zip archive will probably hit its file number limit. 将文件存储在zip存档中可能会达到其文件数限制。 Other options are to zip files per table row, store the html in a separate table in the same database, or to create a separate database file for the html content. 其他选项是按表格行压缩文件,将html存储在同一数据库的单独表中,或为html内容创建单独的数据库文件。

What will give me the best performance? 什么会给我最好的表现? Or are there other better options? 还是还有其他更好的选择? I need quick insert, update and serach. 我需要快速插入,更新和搜索。

There are a couple different things you could consider doing: 您可以考虑做几件不同的事情:

  • Split the data into separate tables. 将数据拆分到单独的表中。 You could then have 1:1 mappings between the tables, and only join them in if necessary, speeding up queries without them. 然后,您可以在表之间具有1:1映射,并且仅在必要时将它们加入,以加快没有它们的查询。
  • Check your indexes. 检查您的索引。 Just because you have them and you think they're working, doesn't mean they are. 仅仅因为您拥有它们并且您认为它们正在工作,并不意味着它们在起作用。 If I recall correctly, sqlite will use at most one index per query, so you need to make sure you have the best index possible available for the queries you're using. 如果我没记错的话,sqlite每个查询最多使用一个索引,因此您需要确保您拥有尽可能最佳的索引以用于正在使用的查询。 The ANALYZE command can help with that. ANALYZE命令可以帮助您。

After some days of experimenting I came to this conclusion: 经过几天的试验,我得出了以下结论:

  • one database file with one table was the slowest (up to 10 seconds) 具有一个表的一个数据库文件最慢(最多10秒)
  • one database with two tables was twice as fast in the worst case scenario as one table 一个有两个表的数据库在最坏的情况下的速度是一个表的两倍
  • fastest was to have two separate database files. 最快的是拥有两个单独的数据库文件。 one with data needed for search and the other for the huge html data. 一个带有搜索所需的数据,另一个带有巨大的html数据。 this is almost instant in the worst case ~300ms and in normal usage it is instant 在最坏的情况下〜300ms这几乎是瞬间的,在正常使用情况下它是瞬间的

So I reccommend to use two separate database files in this scenario. 因此,我建议在这种情况下使用两个单独的数据库文件。 If someone does not come with a faster/better solution I will accept this as the answer. 如果某人没有更快/更好的解决方案,我将作为答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM