简体   繁体   中英

Azure Searching Metadata in blobs

I am try to find a way to bring back only items in blob storage with metadata that matches a particular piece of data. All fields will have a key called 'FlightNo'.

What I want really want is a way to find all files (listBlobs) that contain a match to the metadata, so one level up, then iterate through that set of data, and find further matches as each file has 5 items of metadata.

Here is my very unfriendly code to date.

 foreach (IListBlobItem item in container.ListBlobs(null, false))
        {
            if (item.GetType() == typeof(CloudBlockBlob))
            {

                CloudBlockBlob blob = (CloudBlockBlob)item;

                blob.FetchAttributes();

                foreach (var metaDataItem in blob.Metadata)
                {
                    dictionary.Add(metaDataItem.Key, metaDataItem.Value);
                }

                if (dictionary.Where(r=>r.Key == "FlightNo" && r.Value == FlightNo).Any())
                {
                    if (dictionary.Where(r => r.Key == "FlightDate" && r.Value == FlightDate).Any())
                    {
                        if (dictionary.Where(r => r.Key == "FromAirport" && r.Value == FromAirport).Any())
                        {
                            if (dictionary.Where(r => r.Key == "ToAirport" && r.Value == ToAirport).Any())
                            {
                                if (dictionary.Where(r => r.Key == "ToAirport" && r.Value == ToAirport).Any())
                                {
                                    retList.Add(new BlobStorage()
                                    {
                                        Filename = blob.Name,
                                        BlobType = blob.BlobType.ToString(),
                                        LastModified = (DateTimeOffset)blob.Properties.LastModified,
                                        ContentType = blob.Properties.ContentType,
                                        Length = blob.Properties.Length,
                                        uri = RemoveSecondary(blob.StorageUri.ToString()),
                                        FlightNo = dictionary.Where(r => r.Key == "FlightNo").Select(r => r.Value).SingleOrDefault(),
                                        Fixture = dictionary.Where(r => r.Key == "FixtureNo").Select(r => r.Value).SingleOrDefault(),
                                        FlightDate = dictionary.Where(r => r.Key == "FlightDate").Select(r => r.Value).SingleOrDefault(),
                                        FromAirport = dictionary.Where(r => r.Key == "FromAirport").Select(r => r.Value).SingleOrDefault(),
                                        ToAirport = dictionary.Where(r => r.Key == "ToAirport").Select(r => r.Value).SingleOrDefault()
                                    });

                                }
                            }
                        }
                    }
                }

                dictionary.Clear();
            }
        }

Thanks. Scott

The accepted answer is highly inefficient, looping through and loading every single Blob and their associated Metadata to check for values wouldn't perform very well with any reasonable volume of data.

It is possible to search Blob meta data using Azure Search. A search index can be created that includes Blobs custom meta data.

The following comprehensive articles explain it all:

Indexing Documents in Azure Blob Storage with Azure Search
Searching Blob storage with Azure Search

Although still in preview, with Blob Index, you can now do a query search on blob metadata (tags).

You won't need to loop thru all of your blobs until you find what you're looking for.

Here's a snippet from the full article :

Blob Index—a managed secondary index, allowing you to store multi-dimensional object attributes to describe your data objects for Azure Blob storage—is now available in preview. Built on top of blob storage, Blob Index offers consistent reliability, availability, and performance for all your workloads. Blob Index provides native object management and filtering capabilities, which allows you to categorize and find data based on attribute tags set on the data.

If I understand correctly that you want to search the blobs that contain all of 5 you mentioned items metadata. You could use the following code to do that. I test it on my side, it works correctly.

var connectionString = "storage connection string";
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("container");
var blobs = container.ListBlobs();
var blobList = new List<CloudBlockBlob>();
foreach (var item in blobs)
 {
      CloudBlockBlob blob = (CloudBlockBlob)item;

      blob.FetchAttributes();
      if (blob.Metadata.Contains(new KeyValuePair<string, string>("FlightNo", "FlightNoValue")) &&
         blob.Metadata.Contains(new KeyValuePair<string, string>("FlightDate", "FlightDateValue")) &&
         blob.Metadata.Contains(new KeyValuePair<string, string>("FromAirport", "FromAirportValue")) &&
         blob.Metadata.Contains(new KeyValuePair<string, string>("ToAirport", "ToAirportValue")) && 
         blob.Metadata.Contains(new KeyValuePair<string, string>("FixtureNo", "FixtureNoValue")))
      {
          blobList.Add(blob);
      }

You cannot search for metadata directly, but you can use tags which are sort of the same as metadata from a practical point of view. Tags are indexed by the storage and the code to search for matching blobs are very straight forward:

    var query = $"@container = 'invoices' AND brand = 'volvo'";
    
    await foreach (var blob in blobServiceClient.FindBlobsByTagsAsync(query))
    {
        Console.WriteLine($"Container: {blob.BlobContainerName}, Blob: {blob.BlobName}");
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM