简体   繁体   中英

How can I read & write one file at same time by different threads on Android/iOS?

I have lots of small files. To save file handles and improve IO efficiency, these files are packed into a big single file. However, for some reason, these small files should be able to update in runtime. So Updating and reading different parts of a big single file at the same time by different threads is required.

Because of the memory limit, mmap is not a good choice. I have to implement it by myself. But I'm concerned about is it safe to read and write different parts of a single file at the same time on iOS/Android. How can I make sure the block which is being writing will not read by other thread.

Should I implement the whole feature by thread locks or there has been some mature technic to do the same work?

By the way, I use C++ for my project. But Java & Obj-C is also an option.

User case example:

My project is an RPG game. When people see an item that is not stored in the original package, the game will load it from the server and save it into the disk automatically and immediately.

One item corresponding to a single file. Each file almost 300KB~1.5MB. There are 3000~5000 items on the server. In the worst case, people will save thousands of files locally.

The good thing is my user can load the items on demand to save the storage. And when updating only changed items will be redownloaded. But thousands of files will lead to a high risk of running out of FD or other resources.

That's why I would like to pack these small files into a single big package file. But I still want to keep the ability to update/add a single file.

In short yes, locks are still the best way to handle that and will forever keep being an important thing in devs' toolbelt.

This kind of problem is as common as there are approaches to solve it, almost making this answer opinion based. I'll sprinkle my opinions here and there, but you will need to chip in your own decisions based on what's best or easier for you.

First of all, managing a huge file with variable size, with many little things of variable size inside it and deleting and creating on the fly, using multiple threads, seems to me as complex as designing and implementing a file system. And I see no advantages compared to the below approaches - well, maybe it will be blazing fast. But trust me, you neither need nor want to go that route.

So I won't exactly answer your original question, instead I'd like to show you a less risky way to go around your problem.

For practical purposes I'll refer to the game items as asset . Also I'll be assuming these assets are not meant for being used directly by the GPU, such as textures, which may need a fresh take that I'm not experienced in.

=========

1- Network cache approach

  • find a library that caches network requests.
  • every time you need an asset you pretend you're getting it from the network, and it gives you a binary. If it's the first time it will ask it from the server, otherwise it's likely to find a copy in the library cache.

ups: very simple and quick to set up. Configure a cache size and the old objects are evicted based on LRU (least recently used). If server is set up properly your app knows if it has the latest version of the asset or there is a new one to be downloaded. And no need to care about locks and thread safety.

downs: can be very inefficient if you set up the cache strategy wrong and your server don't expose the caching headers correctly.

For this approach I can suggest Okhttp version 4, which is written in kotlin. It means you can have it running in android or iOS, and should be relatively easy to interface from C / C++/ Obj-C ( although I haven't tried it personally ), and trivial in java.

There are certainly other libs around, but I know no other one that can be used both in C and Java/JVM.

=========

2- track individual assets separately

You may need a central class to determine if the asset is available, not available, or downloading. You'll need it to eventually check for newer versions, and eventually to delete a couple of them to save space.

That's a lot of info to have in mind for each asset . I feel like the natural approach is to have a database for the purpose of tracking such state.

Now you have 2 options. You can store the asset in the database as a blob. Or get a unique filename, save it yourself to disk and store the filename in the database. I strongly suggest the latter, will make your debugging so much easier and way less risky.

Alternatively you can a class that is created when the app starts, scans available files and versions, and holds all that info in memory.

ups: store each asset individually, either as a file on disk or as a blob. You can keep track of how many times you used it, and come up with strategies to delete them if you want to. downs: choosing a database can take a long time. In particular, SQLite and RealmDb works in both android and iOS, so you can potentially share some stuff.

While reading for this answer I found this very interesting article that claims that on some OSs (including Android) reading stored small blobs (10kB) from sqlite is faster than reading from disk. Interesting surprise, but only marginally faster so not worth doing it just for this gain. Since reading multiple blobs in parallel may create a bottleneck on the db. https://www.sqlite.org/fasterthanfs.html

You only need as many file descriptors as assets being read from disk. After that, you should keep it in memory and close the fd?

===============

3- network cache, but with an in memory cache So this is an optimisation on top of (1) in case something gets too slow. But as with all performance optimisations I strongly suggest you measure before spending time on it. So in the end you KNOW how much time you saved, and if it's worth the extra maintenance after you're done and forget how it works.

Here you roll up a class that can hold, say, 50 assets in memory for very fast access. When it doesn't have the asset it asks for the network library.

ups: it's more performant than (1) and less complex than (2). downs: it's still more complex than (1).

================

1001 - big file and mmap

Why did I number this option as 1001? Because they're in the order I'd recommend, and I'd really not recommend this approach.

I've used mmap many years ago, so I hope I remember its details correctly. At best they apply only to linux with a 1 core processor where I used it, and pls verify that you get the same behavior on the platform you need.

If you create a 1GB file and mmap it you're not going to consume 1GB of RAM since that's only virtual memory. It does consume physical memory proportional to the amount of pages coming from page faults when you read/write to the file.

You don't need any locks to read or write to a mmaped file. Simply read and write to it, and you just have the next read mirroring the last write. Now, I've done this back in 2004 on old 1 core cpu computers. How do they behave in modern multi-core cpus, and how do you ensure that after core 1 writes to a memory position aka file region you can read the same value on core 2 instead of the previously written value? I have no idea and urge you to not implement this without learning it first.

You WILL need locks/semaphores and thread safety for you algorithm that allocates offset for each asset . When your game asks for an asset you need to determine if you have it on disk, which also implies you know where on disk it is. Let's call this "where" offset . And if it's not you need to decide where to store it, download it, and store the file offset somewhere. That's the bit of your code that is prone to race conditions.

ups: fast. But not really sure how much faster than the previous approaches. If you need an asset for the 1st time you still need to wait for a page fault, which will go read that file region from disk and load it in physical memory. downs: managing memory offsets and synchronizing page faults across cores will make you a better programmer, at the cost of a lot of time and tears. And by my experience I'm pretty sure something weird is going to happen on either ios or Android that doesn't behave like expected. Like Why does mmap fail on iOS?

https://medium.com/i0exception/memory-mapped-files-5e083e653b1

=================

1002 - big file and lseek

Yes, there is yet another approach that I not recommend even more. It's basically the above, but instead of reading and writing with mmap, you create one or multiple file descriptors for the same file, and use lseek to read/write the memory regions.

It has all the disadvantages as the previous option and at best the same advantages.

Former gamedev here.

Fabio gave a pretty good and detailed answer. He's absolutely right about options 1001 and 1002. I totally would NOT take that approach.

A combination of 1 and 3 would be my preferred combo. You set a cache size and as new files are added to the cache, remove older ones.

Depending on your game design (open world? game levels), you can have a preprocess that fetches all the files you need before a level (while showing a loading screen), and make sure they are available locally and download from the network if necessary. Re-reading your post, it appears you are already doing that?

But thousands of files will lead to a high risk of running out of FD or other resources.

You should not have the entire file system loaded at once. Only those assets which you are going to need for a particular level. If you need ALL files to be loaded at any one time, I would suggest to go back to the drawing board and relook at your design and architecture.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM