简体   繁体   中英

Why is ListBlobsSegmentedAsync only returning results on second page?

The Problem

I'm trying to grab one page of up to 5000 blobs, with no prefix. The container in question has roughly 26,000 blobs in it. I consistently get no results on my first page, but I noticed the BlobContinuationToken that's returned isn't null, so I can page again and get results on the second page. Why aren't there any results on the first page, but there are on the second?

I'd like to be able to do this, and grab only one page:

var response = await container.ListBlobsSegmentedAsync(null).ConfigureAwait(false);

But this returns no results, so instead, I have to call it again, passing in the continuationToken, at which point I do get results.

What I've Considered

  • This only started happening when the container got smaller (it used to have over 100,000 blobs)
  • I'm doing frequent deletes on this container, but I couldn't find anything that said this should impact availability
  • I tried passing in true for useFlatBlobListing and it didn't change anything, but I don't really understand the option (as far as I'm aware, my container's contents are flat)
  • I've used ListBlobsSegmentedAsync before and never noticed this problem (but the containers were larger)
  • I'm using version 4.3.0 of the Storage SDK, which is outdated. I tried updating but it didn't fix the problem, so I went back
  • I've tried passing in a null continuationToken as well as just new BlobContinuationToken() . I'm not sure if one is preferable
  • I can verify that there are still 26,000 blobs in the container via the Cloud Explorer in Visual Studio, but not in code on the first page of results. What's the Cloud Explorer doing differently, I wonder?

Edit with more troubleshooting

On a larger container, after awhile it started taking more than two page fetches in order to get results. Each page fetch (including the empty ones) took right around 5 seconds, until it finally returned results. I saw it take up to 12 page fetches at its peak, taking over 60 seconds total to return results on a container that had over 300,000 blobs. This is shortly after doing massive deletes on the container.

It's not at all unexpected that you can occasionally get empty pages or pages with less than the max results along with a continuation token. Why is this a problem if the continuation token returned takes you to your next page? If you don't want to deal with continuation tokens, ListBlobs (not the segmented version) will give an iterator that will lazily get more blobs and follow the continuation tokens for you.

As for the root cause, there's a lot of reasons this could happen. My guess is actually the frequent deletes in your case, but that's a guess. Returning less than the number of max results and a continuation happens for multiple reasons, but a couple I suspect here are: 1. We hit the server-side timeout, so we return what we have thus far 2. Hit edge of a partition which happens more frequently when the blob list is large and may span several machines. If you're frequently deleting blobs and have a lot it may take some time to actually garbage collect those so we'll spend all our time scanning through stuff we don't return.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM