简体   繁体   中英

Slow zfs create with thousands of datasets

I am creating a zfs system where each instance of a certain entity in my system has its own dataset in zfs. This is needed because each entity consists of a lot of small files that are really slow to copy or delete. So I decided to try out relying on zfs datasets to either destroy or snapshot/copy an entity in its entirety regardless of its contents.

But now during my benchmarks, which is around 5000+ datasets and counting, creating a new dataset using 'zfs create' sometimes takes up to 9 minutes. Although 9 minutes is really slow but still acceptable, I am afraid that it will only become worse if I increase the number of datasets. And 5000 isn't that many yet in my opinion.

System information:

  • Ubuntu 20.04 LTS
  • zfs 0.8.3
  • Pool consists of 2 10 TB disks
  • 16 CPU, 126 GB RAM

Does anyone have experience with working with large amounts of datasets with zfs and can tell me more about the performance in such a situation? Or whether I am using zfs in a way it isn't intended?

The way ZFS works internally is using a concept called a txg (transaction group). This concept helps ZFS know what order operations happened in, so there is just a single integer txg that is available at any given time (no parallelism by design). In normal circumstances, a new txg is created every few seconds, to create a reasonably recent recovery point if the system crashes. When this happens it requires some work to be done, mostly flushing any outstanding writes to disk. However, a new txg must be created any time you mutate the ZFS metadata by creating a new dataset, taking a snapshot, etc. which means that those operations are a bit heavier than you might expect.

In your case, my guess of what's happening is that your application is doing a ton of these filesystem operations (creations, deletions, snapshots, etc.), and the queue to process your requests is just getting longer and longer because the system isn't able to keep up.

There are three possible solutions:

  1. profile the system to figure out what the bottleneck is, and throw more hardware at the problem if possible (I'm guessing this is somewhat unlikely to work)
  2. make do with fewer filesystem operations by grouping directories together into a smaller number of filesystems or something
  3. batch up multiple filesystem operations into a single txg using a ZFS channel program , which is basically a Lua script that can be invoked in the middle of a txg creation that can run arbitrary ZFS filesystem operations

I have one last thought to leave you with: deleting a filesystem in ZFS looks immediate, but internally, the filesystem is hidden immediately, but its data is freed asynchronously by a background thread which can take a while. You can see the amount of space waiting to be freed by running the command zpool get -o freeing <pool> . So this whole design of using ZFS to delete stuff faster might not actually be doing that much for you. If you want to get this behavior without the txg overhead, you could just create a queue inside your application with a background thread that will delete directories that are no longer in use.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM