简体   繁体   English

Azure BlobStorage Blob索引

[英]Azure BlobStorage blobs to Index

Is it possible to upload a document to a blob storage and do the following: 是否可以将文档上传到Blob存储并执行以下操作:

  1. Grab contents of document and add to index. 抓取文档内容并添加到索引。
  2. Grab key phrases from contents in point 1 and add to index. 从第1点的内容中获取关键短语,然后添加到索引中。

I want the key phrases then to be searchable. 我希望关键词可以搜索。

I have code that can upload documents to a blobstorage which works perfect, but the only way to get this indexed(that I know of) is by using the "Import Data" within the Azure Search service, which creates and index with predefined fields - as below: 我有可以将文档上载到完美的blobstorage的代码,但是获得此索引(我知道)的唯一方法是使用Azure搜索服务中的“导入数据”,该服务使用预定义的字段创建和索引-如下:

在此处输入图片说明

This works great when only needing these fields and the index gets updated automatically every 5 min. 当仅需要这些字段并且索引每5分钟自动更新一次时,这非常有用。 But becomes a problem when I want to have a custom Index 但是当我想要一个自定义索引时变成一个问题

However, the only fields I DO want, are the following: 但是,我只想要以下字段:

  • fileId FILEID
  • fileText(this is the content of the document) fileText(这是文档的内容)
  • blobURL(To allow downloading of the document) blobURL(允许下载文档)
  • keyPhrases(Which are to be pulled from fileText - I have code that does this as well) keyPhrases(将从fileText中拉出-我也有执行此操作的代码)

The only issue I have is that I need to be able to retrieve the Document content(fileText) to be able to get the keyPhrases, but to my understanding, I can only do this if the Document Content is already in an index for me to access that Content? 我唯一的问题是我需要能够检索Document内容(fileText)才能获取keyPhrases,但是据我所知,只有在Document Content已经存在于我的索引中时,我才能这样做访问该内容?

I have very limited knowledge with Azure and struggling to find anything that similar to what I want to do. 我对Azure的了解非常有限,并且很难找到与我想做的事情类似的事情。

The code that I am using to upload a document to my blob storage is as follows: 我用来将文档上传到我的Blob存储的代码如下:

public CloudBlockBlob UploadBlob(HttpPostedFileBase file)
    {
        string searchServiceName = ConfigurationManager.AppSettings["SearchServiceName"];
        string blobStorageKey = ConfigurationManager.AppSettings["BlobStorageKey"];
        string blobStorageName = ConfigurationManager.AppSettings["BlobStorageName"];
        string blobStorageURL = ConfigurationManager.AppSettings["BlobStorageURL"];
        string UserID = User.Identity.GetUserId();
        string UploadDateTime = DateTime.Now.ToString("yyyyMMddhhmmss").ToString();

        try
        {
            var path = Path.Combine(Server.MapPath("~/App_Data/Uploads"), UserID + "_" + UploadDateTime + "_" + file.FileName);

            file.SaveAs(path);

            var credentials = new StorageCredentials(searchServiceName, blobStorageKey);

            var client = new CloudBlobClient(new Uri(blobStorageURL), credentials);

            // Retrieve a reference to a container. (You need to create one using the mangement portal, or call container.CreateIfNotExists())
            var container = client.GetContainerReference(blobStorageName);

            // Retrieve reference to a blob named "myfile.gif".
            var blockBlob = container.GetBlockBlobReference(UserID + "_" + UploadDateTime + "_" + file.FileName);

            // Create or overwrite the "myblob" blob with contents from a local file.
            using (var fileStream = System.IO.File.OpenRead(path))
            {
                blockBlob.UploadFromStream(fileStream);
            }

            System.IO.File.Delete(path);

            return blockBlob;
        }
        catch (Exception e)
        {
            var r = e.Message;
            return null;
        }
    }

I hope I havnt given too much information, but I dont know how else to explain what I am looking for. 我希望我能提供太多信息,但是我不知道该如何解释我所寻找的东西。 If I am not making sense, please let me know so that I can fix my question. 如果我没有道理,请告诉我,以便我解决问题。

I am not looking for handout code, just looking for a shove in the right direction. 我不是在寻找讲义代码,只是在寻找正确的方向。

I would appreciate any help. 我将不胜感激任何帮助。

Thanks! 谢谢!

We can use Azure Search to index document by Azure Search REST API and .NET SDK . 我们可以使用Azure搜索通过Azure搜索REST API.NET SDK为文档编制索引。 According to your description, I create a demo with .NET SDK and test it successfully. 根据您的描述,我使用.NET SDK创建了一个演示并成功进行了测试。 The following is my details steps: 以下是我的详细步骤:

  1. Create Azure Search from the Azure Portal 从Azure门户创建Azure搜索

在此处输入图片说明

  1. Get the Search Key from the Azure portal 从Azure门户获取搜索键

在此处输入图片说明

  1. Create custom index field model 创建自定义索引字段模型

    [SerializePropertyNamesAsCamelCase] public class TomTestModel { [Key] [IsFilterable] public string fileId { get; set; } [IsSearchable] public string fileText { get; set; } public string blobURL { get; set; } [IsSearchable] public string keyPhrases { get; set; } }

4.Create DataSource 4,创建数据源

       string searchServiceName = ConfigurationManager.AppSettings["SearchServiceName"];
       string adminApiKey = ConfigurationManager.AppSettings["SearchServiceAdminApiKey"];
       SearchServiceClient serviceClient = new SearchServiceClient(searchServiceName, new SearchCredentials(adminApiKey));

       var dataSource = DataSource.AzureBlobStorage("storage name", "connectstrong", "container name");
        //create data source
        if (serviceClient.DataSources.Exists(dataSource.Name))
        {
            serviceClient.DataSources.Delete(dataSource.Name);
        }
        serviceClient.DataSources.Create(dataSource);
  1. Create custom index 创建自定义索引

var definition = new Index() { Name = "tomcustomindex", Fields = FieldBuilder.BuildForType<TomTestModel>() }; //create Index if (serviceClient.Indexes.Exists(definition.Name)) { serviceClient.Indexes.Delete(definition.Name); } var index = serviceClient.Indexes.Create(definition);

在此处输入图片说明

  1. Upload document to the index,more information about operation storage using SDK please refer to document 将文档上传到索引,有关使用SDK进行操作存储的更多信息,请参考文档

      CloudStorageAccount storageAccount = CloudStorageAccount.Parse("connection string"); var blobClient = storageAccount.CreateCloudBlobClient(); var container =blobClient.GetContainerReference("container name"); var blobList = container.ListBlobs(); var tomIndexsList = blobList.Select(blob => new TomTestModel { fileId = Guid.NewGuid().ToString(), blobURL = blob.Uri.ToString(), fileText = "Blob Content", keyPhrases = "key phrases", }).ToList(); var batch = IndexBatch.Upload(tomIndexsList); ISearchIndexClient indexClient = serviceClient.Indexes.GetClient("index"); indexClient.Documents.Index(batch); 
  2. Check the search result from the search explore. 从搜索浏览器中检查搜索结果。

在此处输入图片说明

Page.config file: Page.config文件:

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="Microsoft.Azure.KeyVault.Core" version="1.0.0" targetFramework="net452" />
  <package id="Microsoft.Azure.Search" version="3.0.0-rc" targetFramework="net452" />
  <package id="Microsoft.Data.Edm" version="5.6.4" targetFramework="net452" />
  <package id="Microsoft.Data.OData" version="5.6.4" targetFramework="net452" />
  <package id="Microsoft.Data.Services.Client" version="5.6.4" targetFramework="net452" />
  <package id="Microsoft.Rest.ClientRuntime" version="2.3.4" targetFramework="net452" />
  <package id="Microsoft.Rest.ClientRuntime.Azure" version="3.3.4" targetFramework="net452" />
  <package id="Microsoft.Spatial" version="6.15.0" targetFramework="net452" />
  <package id="Newtonsoft.Json" version="7.0.1" targetFramework="net452" />
  <package id="System.Spatial" version="5.6.4" targetFramework="net452" />
  <package id="WindowsAzure.Storage" version="7.2.1" targetFramework="net452" />
</packages>

TomTestModel file: TomTestModel文件:

using System.ComponentModel.DataAnnotations;
using Microsoft.Azure.Search;
using Microsoft.Azure.Search.Models;

namespace TomAzureSearchTest
{
    [SerializePropertyNamesAsCamelCase]
    public class TomTestModel
    {
        [Key]
        [IsFilterable]
        public string fileId { get; set; }
        [IsSearchable]
        public string fileText { get; set; }
        public string blobURL { get; set; }
        [IsSearchable]
        public string keyPhrases { get; set; }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM