简体   繁体   中英

Would it be feasible to store/use an array with 200 thousand items?

I have 239 text files to process in an application, at the moment the files are stored as resources and what I do is to load just one of the resources to an Array when I need to read the content of that file, when I finish to read the content I set the Array to Nothing , and I do the same for the other files when I need to read one of them.

Would be better If I load all the text-file contents of the 239 files into an Array at the loading of the application? that would sum an Array of about 200.000 item count.

Really what I think to do in a future is to generate an XML which contains all of the 239 text-file contents, and then load that XML into an object to manage the attributes/properties, but at the moment I don't have any idea of how to generate the XML either how to Read it right, but just by now knowing if an object/array of 200.000 lines/elements should be better to manage instead of reading some files/xml would be great to know for me.

The actual question is not how many items but what is the total size of the items . 10 MB is no problem in a system with 4 GB or more and, certainly, you can trust the system to page out accordingly. Loading all the data in an array will, most certainly, speed operations on them and you will avoid constant resizing of the array.

So, my opinion on this matter is that it can be better if you did load all the items, if you want to lighten disk load and improve processing performance.

Even at 10 MB why take the memory hit.
Don't chance scale unless you have a specific performance issue.

With an array you need to size when you create it.
Do you know the know the size up front?

I would process a file at at time.
If the size if each file is the same then you can just re-use the same array for each file.

If you don't know the size of each file reuse a List a file at a time as it sizes to Capacity and Clear does not release capacity so you are not taking the hit of resizing.
Up front size set it to about what you expect the largest single file.

List.Capacity Property

There is one thing we tend to forget: Window's excellent caching feature. If you just don't care about caching yourself and the reading procedure is fairly fast (eg one single slurp ), you might ignore caching.

In other cases, I would recommend an extremely easy to implement "MRU cache" (MRU = most recently used). They are quite effective and implemented in a couple of minutes.

Say, you want to keep the 20 most often used files. Simply create a list. The list will hold the contents of each file (in an array) and the filename.

Each time you attempt to (re-read) the contents of a file, look in the list first. If it is in the list, move the contents to the front of the list and return the contents. If it is not in the list, read the file and put it to the front of the list. If the list now contains more than 20 elements, discard the last element of the list.

You can increase '20' to your needs and will always have the top 20 most recently used list in memory.

Here's some pseudo code:

FileContents ReadFile(filename)
   i = List.IndexOf(filename) 
   if (i == NOTFOUND) 
     content = PhysicallyRead(filename)
   else
     content = List[i];
     List.RemoveItemAt(i);
   end
   List.InsertAt(0, content, filename)
   If List.Length > MAXLENGTH
     List.RemoveItemAt(MAXLENGTH)

I hope you get the idea. The only thing you have to take care about is that the list operations are fairly fast.

Using a database instead is generally the best approach, especially when you have to filter the file contents, do calculations on a row-by-row basis and so on. However, if speed is really crucial, an in-memory solution might be better.

But, if you want to keep it simple and extendible, consider using an embedded database solution (such as SQLite, Firebird embedded, SQL-Server embedded) as mentioned by the previous answers.

Hope this helps a bit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM