简体   繁体   English

Azure 搜索服务比较要上传和删除的文档

[英]Azure Search Service comparing documents to be uploaded and deleting

I'm very new to the Azure Search Service.我对 Azure 搜索服务非常陌生。 For the current project that I am working on, I am uploading a large number of documents to an Azure Search Index.对于我正在进行的当前项目,我正在将大量文档上传到 Azure 搜索索引。 We will be using the Azure Search Cognitive Api (documentation here https://docs.microsoft.com/en-us/rest/api/searchservice/addupdate-or-delete-documents ) to upload and add new documents using the mergeOrUpload action. We will be using the Azure Search Cognitive Api (documentation here https://docs.microsoft.com/en-us/rest/api/searchservice/addupdate-or-delete-documents ) to upload and add new documents using the mergeOrUpload action . This approach is fine so long as we are adding new data that doesn't exist already.只要我们添加尚不存在的新数据,这种方法就很好。

I have been trying to find out if there is a way of comparing the documents in the index already to what I am about to upload, to see if there's any data that should be deleted.我一直在尝试找出是否有办法将索引中的文档与我将要上传的文档进行比较,看看是否有任何数据应该删除。 Ie what I am about to upload contains some documents that should no longer be in the index and I want to only delete those specific ones.即我要上传的内容包含一些不应再在索引中的文件,我只想删除那些特定的文件。 I can't see that any of the upload , merge etc actions will help here.我看不到任何uploadmerge等操作都会在这里有所帮助。 There is a delete action but this removes a specified document and relies on me knowing exactly which document needs to be deleted, whereas if possible I'd prefer a way of comparing to remove the need for any manual intervention.有一个delete操作,但这会删除一个指定的文档,并且依赖于我确切知道需要删除哪个文档,而如果可能的话,我更喜欢一种比较方式来消除对任何手动干预的需要。 Does anyone know of a way to handle this?有谁知道处理这个问题的方法?

You need to define a unique id for your index / documents.您需要为您的索引/文档定义一个唯一 ID。 Using mergeOrUpload, Azure Cognitive Search will check if there's a document with the ID you're trying to insert.使用 mergeOrUpload,Azure 认知搜索将检查是否存在具有您要插入的 ID 的文档。 If so, it will compare the contents and perform the changes (if needed), in case there's no match for the document id, it will insert it.如果是这样,它将比较内容并执行更改(如果需要),如果文档 ID 不匹配,它将插入它。

There is a difference between using the API directly to push content like you describe here vs. defining a data source.使用 API 直接推送您在此处描述的内容与定义数据源之间存在差异。

If you want deletes to be handled for you, you can upload your content to Azure Blob Storage or some other type of content source that is supported out-of-the-box.如果您希望为您处理删除,您可以将您的内容上传到 Azure Blob 存储或开箱即用支持的其他类型的内容源。 In this scenario, you define a data source and wire it up to your storage.在这种情况下,您定义一个数据源并将其连接到您的存储。 As you add, change and delete content the necessary changes are reflected in Azure Search for you.当您添加、更改和删除内容时,必要的更改会反映在 Azure 中搜索您。 See the article Import data wizard for Azure Cognitive Search for a step-by-step example.有关分步示例,请参阅Azure 认知搜索的导入数据向导一文。

When you use the API, you are responsible for keeping track of the state of your documents.当您使用 API 时,您有责任跟踪文档的 state。 When content is added, changed, or deleted, you have to do what's necessary to reflect that in the index.添加、更改或删除内容时,必须执行必要的操作以在索引中反映这些内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM