简体   繁体   English

存储下载文件的最佳方法是什么?

[英]What is the best way to store downloaded files?

Sorry for the bad title. 对不起,标题不好。

I'm saving web pages. 我正在保存网页。 I currently use 1 XML file as an index. 我目前使用1个XML文件作为索引。 One element contains file created date (UTC), full URL (w. query string and what not). 一个元素包含文件创建日期(UTC),完整URL(带有查询字符串,而没有)。 And the headers in a separate file with similar name but appended special extension. 并将标头放在一个单独的文件中,该文件具有相似的名称,但附加特殊扩展名。

However, going at 40k (incl. header) files, the XML is now 3.5 MB. 但是,要处理40k(包括头文件)文件,XML现在为3.5 MB。 Recently I was still reading, adding new entry, save this XML file. 最近,我仍在阅读,添加新条目,保存此XML文件。 But now I keep it in memory and save it every once in a while. 但是现在我将其保留在内存中,并偶尔保存一次。

When I request a page, the URL is looked up using XPath on the XML file, if there is an entry, the file path is returned. 当我请求页面时,使用XML文件上的XPath查找URL,如果有条目,则返回文件路径。

The directory structure is .\\www.host.com/ randomFilename.randext 目录结构为。\\ www.host.com/ randomFilename.randext

So I am looking for a better way. 所以我正在寻找更好的方法。

Im thinking: 我在想:

  • One XML file per. 每个XML文件一个。 domain (incl. subdomains). 域(包括子域)。 But I feel this might be a hassle. 但是我觉得这可能很麻烦。
  • Using SVN. 使用SVN。 I just tested it, but I have no experience in large repositories. 我只是测试了它,但没有大型存储库的经验。 Executing svn add " path to file " for every download, and commit when I'm done. 执行svn为每次下载添加“ 文件路径 ”,并在完成后提交。
  • Create a custom file system, where I then can include everything I want, for ex. 创建一个自定义文件系统,然后在其中可以包含我想要的所有内容,例如。 POST-data. 发布数据。
  • Generating a filename from the URL and somehow flattening the querystring, but large querystrings might be rejected by the OS. 从URL生成文件名并以某种方式展平查询字符串,但是操作系统可能会拒绝较大的查询字符串。 And if I keep it with the headers, I still need to keep track of multiple files mapped to each different query string. 而且,如果我将其保留在标头中,则仍然需要跟踪映射到每个不同查询字符串的多个文件。 Hassle. 麻烦 And I don't want it to execute too slow either. 而且我也不希望它执行得太慢。

Multiple program instances will perform read/write operations, on different computers. 多个程序实例将在不同的计算机上执行读/写操作。

If I follow the directory/file method, I could in theory add a layer between so it uses DotNetZip on the fly. 如果我遵循目录/文件方法,则理论上我可以在两者之间添加一个层,以便它可以动态使用DotNetZip But then again, the query string. 但是再一次,查询字符串。

I'm just looking for direction or experience here. 我只是在这里寻找方向或经验。

What I also want is the ability to keep history of these files, so the local file is not overwritten, and then I can pick which version (by date) I want. 我还想要保留这些文件的历史记录的功能,这样就不会覆盖本地文件,然后我可以选择想要的版本(按日期)。 Thats why I tried SVN. 那就是为什么我尝试SVN。

I would recommend either a relational database or a version control system. 我建议使用关系数据库或版本控制系统。

You might want to use SQL Server 2008's new FILESTREAM feature to store the files themselves in the database. 您可能想使用SQL Server 2008的新FILESTREAM功能将文件本身存储在数据库中。

I would use 2 data stores, one for the raw files and another for indexes. 我将使用2个数据存储,一个用于原始文件,另一个用于索引。

To stored the flat file, I think Berkeley DB is a good choice, the key can be generated by md5 or other hash function, and you can also compress the content of the file to save some disk space. 要存储平面文件,我认为Berkeley DB是一个不错的选择,密钥可以通过md5或其他哈希函数生成,也可以压缩文件内容以节省一些磁盘空间。

For indexes, you can use relational database or more sophisticated text search engine like Lucene. 对于索引,您可以使用关系数据库或更复杂的文本搜索引擎(如Lucene)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从asp.net mvc 2应用程序将pdf文件存储在sql服务器中的最佳方法是什么? - what is the best way to store pdf files in sql server from an asp.net mvc 2 application? 存储对AD组的引用的最佳方法是什么? - What is the best way to store a reference to an AD group? 将DateTime存储到cookie中的最佳方法是什么? - What is the best way to store DateTime into cookies? 存储临时数据的最佳方法是什么? - What is the best way to store temporary data? 在数据库中存储货币价值的最佳方式是什么? - What is the best way to store a money value in the database? 合并大文件的最佳方法是什么? - What is the best way to merge large files? 比较 XML 文件是否相等的最佳方法是什么? - What is the best way to compare XML files for equality? Windows应用商店中下载和上传文件的最佳用途是什么? - What is the best use of download and upload files in a windows store app? 在另一组文件中搜索具有特定扩展名的文件的最佳方法是什么 - What is the best way of searching a files with specific extension in another group of files WPF,什么是存储要在启动时使用的常量的最佳方法? - WPF, What's the best way to store a constant to be used at startup?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM