简体   繁体   English

在C#中缓存二进制文件

[英]Caching a binary file in C#

是否可以在.NET中缓存二进制文件并在缓存文件上执行普通文件操作?

The way to do this is to read the entire contents from the FileStream into a MemoryStream object, and then use this object for I/O later on. 执行此操作的方法是将FileStream的所有内容读入MemoryStream对象,然后将此对象用于I / O. Both types inherit from Stream , so the usage will be effectively identical. 这两种类型都继承自Stream ,因此使用方式实际上是相同的。

Here's an example: 这是一个例子:

private MemoryStream cachedStream;

public void CacheFile(string fileName)
{
    cachedStream = new MemoryStream(File.ReadAllBytes(fileName));
}

So just call the CacheFile method once when you want to cache the given file, and then anywhere else in code use cachedStream for reading. 因此,当您想要缓存给定文件时,只需调用一次CacheFile方法,然后在代码中的任何其他位置使用cachedStream进行读取。 (The actual file will been closed as soon as its contents was cached.) Only thing to remember is to dispose cachedStream when you're finished with it. (实际文件一旦缓存其内容就会被关闭。)唯一要记住的是在完成后配置cachedStream

Any modern OS has a caching system built in, so in fact whenever you interact with a file, you are interacting with an in-memory cache of the file. 任何现代操作系统都内置了一个缓存系统,因此实际上无论何时与文件交互,您都要与文件的内存缓存进行交互。

Before applying custom caching, you need to ask an important question: what happens when the underlying file changes, so my cached copy becomes invalid? 在应用自定义缓存之前,您需要提出一个重要问题:当基础文件发生更改时会发生什么,因此我的缓存副本变得无效?

You can complicate matters further if the cached copy is allowed to change, and the changes need to be saved back to the underlying file. 如果允许更改缓存副本,则可以进一步使问题复杂化,并且需要将更改保存回基础文件。

If the file is small, it's simpler just to use MemoryStream as suggested in another answer. 如果文件很小,只需按照另一个答案中的建议使用MemoryStream

If you need to save changes back to the file, you could write a wrapper class that forwards everything on to MemoryStream , but additionally has an IsDirty property that it sets to true whenever a write operation is performed. 如果需要将更改保存回文件,可以编写一个包装类,将所有内容转发到MemoryStream ,但另外还有一个IsDirty属性,只要执行写操作,它就会设置为true。 Then you can have some management code that kicks in whenever you choose (at the end of some larger transaction?), checks for (IsDirty == true) and saves the new version to disk. 然后,您可以随时选择一些管理代码(在某个较大的事务结束时?),检查(IsDirty == true)并将新版本保存到磁盘。 This is called "lazy write" caching, as the modifications are made in memory and are not actually saved until sometime later. 这称为“延迟写入”缓存,因为修改是在内存中进行的,并且直到稍后才会实际保存。

If you really want to complicate matters, or you have a very large file, you could implement your own paging, where you pick a buffer size (maybe 1 MB?) and hold a small number of byte[] pages of that fixed size. 如果你真的想让问题复杂化,或者你有一个非常大的文件,你可以实现自己的分页,在那里你选择一个缓冲区大小(可能是1 MB?)并保留少量固定大小的byte[]页面。 This time you'd have a dirty flag for each page. 这次你的每个页面都有一个脏标志。 You'd implement the Stream methods so they hide the details from the caller, and pull in (or discard) page buffers whenever necessary. 您将实现Stream方法,以便隐藏调用者的详细信息,并在必要时提取(或丢弃)页面缓冲区。

Finally, if you want an easier life, try: 最后,如果您想要更轻松的生活,请尝试:

http://www.microsoft.com/Sqlserver/2005/en/us/compact.aspx http://www.microsoft.com/Sqlserver/2005/en/us/compact.aspx

It lets you use the same SQL engine as SQL Server but on a file, with everything happening inside your process instead of via an external RDBMS server. 它允许您使用与SQL Server相同的SQL引擎,但是在文件上,所有内容都在您的进程内发生,而不是通过外部RDBMS服务器。 This will probably give you a much simpler way of querying and updating your file, and avoid the need for a lot of hand-written persistence code. 这可能会为您提供一种更简单的查询和更新文件的方法,并避免需要大量手写的持久性代码。

Well, you can of course read the file into a byte[] array and start working on it. 那么,您当然可以将文件读入byte []数组并开始处理它。 And if you want to use a stream you can copy your FileStream into a MemoryStream and start working with it - like: 如果你想使用一个流你可以将你的FileStream复制到一个MemoryStream并开始使用它 - 如:

public static void CopyStream( Stream input, Stream output )
{
        var buffer = new byte[32768];
        int readBytes;
        while( ( readBytes = input.Read( buffer, 0, buffer.Length ) ) > 0 )
        {
                output.Write( buffer, 0, readBytes );
        }
}

If you are concerned about performance - well, normally the build-in mechanisms of the different file access methods should be enough. 如果您担心性能 - 通常,不同文件访问方法的内置机制应该足够了。

I don't know what exactly you're doing, but I offer this suggestion (which may or may not be viable depending on what you're doing): 我不知道你究竟在做什么,但是我提出了这个建议(取决于你正在做什么,这可能是也可能不可行):

Instead of only caching the contents of the file, why don't you put the contents of the file in a nice strongly typed collection of items, and then cache that? 而不是只缓存文件的内容,为什么不把文件的内容放在一个很好的强类型的项集合中,然后缓存它? It'll probably make searching for items a bit easier, and faster since there is no parsing involved. 它可能会使搜索项目更容易,更快,因为不涉及解析。

There is a very elegant caching system in Lucene that caches bytes from the disk into memory and intelligently updates the store etc. You might want to have a look at that code to get an idea of how they do it. Lucene中有一个非常优雅的缓存系统,它将磁盘中的字节缓存到内存中并智能地更新商店等。您可能希望查看该代码以了解它们是如何操作的。 You might also want to read up on the Microsoft SQL Server data storage layer - as the MSSQL team is pretty forthcoming about some of the more crucial implementation details. 您可能还想阅读Microsoft SQL Server数据存储层 - 因为MSSQL团队非常关注一些更重要的实现细节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM