简体   繁体   English

存储/使用包含20万个项目的数组是否可行?

[英]Would it be feasible to store/use an array with 200 thousand items?

I have 239 text files to process in an application, at the moment the files are stored as resources and what I do is to load just one of the resources to an Array when I need to read the content of that file, when I finish to read the content I set the Array to Nothing , and I do the same for the other files when I need to read one of them. 我有239个文本文件要在应用程序中处理,此刻文件存储为资源,当需要读取文件内容时,我要做的就是将一个资源仅加载到Array中。读取将Array设置为Nothing的内容,并且在需要读取其中一个文件时对其他文件执行相同的操作。

Would be better If I load all the text-file contents of the 239 files into an Array at the loading of the application? 如果在加载应用程序时将239个文件的所有文本文件内容加载到Array中会更好吗? that would sum an Array of about 200.000 item count. 总共约有200.000个项目计数的数组。

Really what I think to do in a future is to generate an XML which contains all of the 239 text-file contents, and then load that XML into an object to manage the attributes/properties, but at the moment I don't have any idea of how to generate the XML either how to Read it right, but just by now knowing if an object/array of 200.000 lines/elements should be better to manage instead of reading some files/xml would be great to know for me. 实际上,我认为将来要做的是生成包含所有239个文本文件内容的XML,然后将该XML加载到对象中以管理属性/属性,但是目前我还没有任何XML我想知道如何生成XML或如何正确读取XML,但是现在就知道是否应该更好地管理200.000行/元素的对象/数组而不是读取一些文件/ xml。

The actual question is not how many items but what is the total size of the items . 实际的问题不是多少个项目,而是项目 的总大小是 多少 10 MB is no problem in a system with 4 GB or more and, certainly, you can trust the system to page out accordingly. 在具有4 GB或更大容量的系统中,10 MB没问题,当然,您可以相信系统可以相应地分页。 Loading all the data in an array will, most certainly, speed operations on them and you will avoid constant resizing of the array. 最肯定的是,将所有数据加载到数组中将加快对其的操作,并且您将避免不断调整数组的大小。

So, my opinion on this matter is that it can be better if you did load all the items, if you want to lighten disk load and improve processing performance. 因此,我对此事的看法是,如果要减轻磁盘负载并提高处理性能,则最好是装载所有项目。

Even at 10 MB why take the memory hit. 即使只有10 MB,为什么还要占用内存。
Don't chance scale unless you have a specific performance issue. 除非您遇到特定的性能问题,否则不要扩展规模。

With an array you need to size when you create it. 对于数组,创建时需要调整大小。
Do you know the know the size up front? 您知道前面的尺寸吗?

I would process a file at at time. 我会同时处理一个文件。
If the size if each file is the same then you can just re-use the same array for each file. 如果每个文件的大小相同,则可以为每个文件重复使用相同的数组。

If you don't know the size of each file reuse a List a file at a time as it sizes to Capacity and Clear does not release capacity so you are not taking the hit of resizing. 如果您不知道每个文件的大小,请一次重用一个文件列表,因为它的大小为“容量”,而“清除”不会释放容量,因此您不会承受调整大小的麻烦。
Up front size set it to about what you expect the largest single file. 前端大小将其设置为您期望的最大单个文件大小。

List.Capacity Property 列表容量属性

There is one thing we tend to forget: Window's excellent caching feature. 我们往往会忘记一件事:Window的出色缓存功能。 If you just don't care about caching yourself and the reading procedure is fairly fast (eg one single slurp ), you might ignore caching. 如果您只是不关心缓存自己,并且读取过程相当快(例如,一个slurp ),则可以忽略缓存。

In other cases, I would recommend an extremely easy to implement "MRU cache" (MRU = most recently used). 在其他情况下,我建议您使用一个非常容易实现的“ MRU缓存”(MRU =最近使用)。 They are quite effective and implemented in a couple of minutes. 它们非常有效,并在几分钟内实现。

Say, you want to keep the 20 most often used files. 假设您要保留20个最常用的文件。 Simply create a list. 只需创建一个列表。 The list will hold the contents of each file (in an array) and the filename. 该列表将保存每个文件(数组)的内容和文件名。

Each time you attempt to (re-read) the contents of a file, look in the list first. 每次尝试(重新读取)文件的内容时,请先在列表中查找。 If it is in the list, move the contents to the front of the list and return the contents. 如果在列表中,请将内容移到列表的最前面,然后返回内容。 If it is not in the list, read the file and put it to the front of the list. 如果它不在列表中,请阅读文件并将其放在列表的最前面。 If the list now contains more than 20 elements, discard the last element of the list. 如果列表现在包含20个以上的元素,则丢弃列表的最后一个元素。

You can increase '20' to your needs and will always have the top 20 most recently used list in memory. 您可以根据需要增加“ 20”,并且在内存中将始终具有前20个最近使用的列表。

Here's some pseudo code: 这是一些伪代码:

FileContents ReadFile(filename)
   i = List.IndexOf(filename) 
   if (i == NOTFOUND) 
     content = PhysicallyRead(filename)
   else
     content = List[i];
     List.RemoveItemAt(i);
   end
   List.InsertAt(0, content, filename)
   If List.Length > MAXLENGTH
     List.RemoveItemAt(MAXLENGTH)

I hope you get the idea. 希望您能明白。 The only thing you have to take care about is that the list operations are fairly fast. 您唯一需要注意的是列表操作相当快。

Using a database instead is generally the best approach, especially when you have to filter the file contents, do calculations on a row-by-row basis and so on. 通常,使用数据库代替是最好的方法,尤其是当您必须过滤文件内容,逐行进行计算等等时。 However, if speed is really crucial, an in-memory solution might be better. 但是,如果速度真的很关键,那么内存中的解决方案可能会更好。

But, if you want to keep it simple and extendible, consider using an embedded database solution (such as SQLite, Firebird embedded, SQL-Server embedded) as mentioned by the previous answers. 但是,如果您想使其保持简单和可扩展性,请考虑使用嵌入式数据库解决方案(例如SQLite,Firebird嵌入式,SQL-Server嵌入式),如先前答案所述。

Hope this helps a bit. 希望这个对你有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM