简体繁体 English

Azure 认知搜索索引器 blob 存储

[英]Azure cognitive search indexer blob storage

原文 2021-01-11 10:42:26 1 1 azure-blob-storage/ azure-cognitive-search/ azure-cognitive-services/ indexer

I am stuck in a complicated situation and appreciate that if somebody can help.我陷入了复杂的境地，如果有人可以提供帮助，我将不胜感激。

So I was testing indexing blob storage( pdf files) and indexed a copy of my storage in qa environment that cost me some money.所以我正在测试索引 blob 存储（pdf 文件）并在 qa 环境中索引我的存储副本，这花了我一些钱。

My question is that: Is there any solution to use this index in production without indexing again?我的问题是：是否有任何解决方案可以在生产中使用此索引而无需再次索引？

I found a solution to copy the index and that works fine but when I add an indexer that is connect to production blob storage it start indexing from scratch again( as I expected).我找到了一个复制索引的解决方案，并且工作正常，但是当我添加一个连接到生产 blob 存储的索引器时，它再次从头开始索引（正如我所料）。 Is there any solution to avid this?有什么解决方案可以解决这个问题吗？ Is there any solution to ask indexer to index from now on?从现在开始，有什么解决方案可以让 indexer 索引吗？

I tried to use the index and the indexer that I already have by changing the subscription to prod.我尝试通过将订阅更改为 prod 来使用我已经拥有的索引和索引器。 But I have to change the data source for indexer to point at production blob storage and in this case I get an error:但是我必须将索引器的数据源更改为指向生产 blob 存储，在这种情况下我会收到一个错误：

Indexer 'filesIndexer' currently references data source 'qafilesds' and cannot be updated to reference a different datasource 'prodfilesds' because it has a non-empty change tracking state, or it is currently in progress.索引器“filesIndexer”当前引用数据源“qafilesds”并且无法更新以引用不同的数据源“prodfilesds”，因为它具有非空更改跟踪 state，或者它当前正在进行中。 You can use Reset API to reset the indexer's change tracking state when it is no longer in progress, and retry this call.您可以使用 Reset API 在不再进行时重置索引器的更改跟踪 state，然后重试此调用。

1 个解决方案

A simple answer to your first question is to simply use the qa index you built.第一个问题的简单答案是简单地使用您构建的 qa 索引。

A more complicated answer is to switch from the push model you are using now to a pull model.一个更复杂的答案是从您现在使用的推式 model 切换到拉式 model。 From your explanation above I assume all of your content comes from blob storage.根据您上面的解释，我假设您的所有内容都来自 blob 存储。 And you have configured an indexer to do the indexing for you.并且您已经配置了一个索引器来为您进行索引。 This is known as the pull model.这被称为拉 model。

The alternative is to use the Azure Cognitive Search SDK to write your own application that submits content to the index instead.另一种方法是使用 Azure 认知搜索 SDK 来编写自己的应用程序，将内容提交到索引。 In this case you do not use the built-in indexer, only the index itself.在这种情况下，您不使用内置索引器，只使用索引本身。 Then you are free to use whatever logic you want to determine what to index and what to skip.然后，您可以自由使用任何您想要确定要索引的内容和要跳过的内容的逻辑。 You can even enable your storage accounts to notify your application with events when content is updated.您甚至可以启用您的存储帐户，以便在内容更新时向您的应用程序通知事件。