简体   繁体   中英

How fast is Azure Search Indexer and how I can index faster?

Each index batch is limited from 1 to 1000 documents. When I call it from my local machine or azure VM, I got 800ms to 3000ms per 1000 doc batch. If I submit multiple batches with async, the time spent is roughly the same. That means it would take 15 - 20 hours for my ~50M document collection.

Is there a way I can make it faster?

It looks like you are using our Standard S1 search service and although there are a lot of things that can impact how fast data can be ingested. I would expect to see ingestion to a single partition search service at a rate of about 700 docs / second for an average index, so I think your numbers are not far off from what I would expect, although please note that these are purely rough estimates and you may see different results based on any number of factors (such as number of fields, quantity of facets, etc)..

It is possible that some of the extra time you are seeing is due to the latency of uploading the content from your local machine to Azure, and it would likely be faster if you did this directly from Azure but if this is just a one time-upload that probably is not worth the effort.

You can slightly increase the speed of data ingestion by increasing the number of partitions you have and the S2 Search Service will also ingest data faster. Although both of these come at a cost.

By the way, if you have 50 M documents, please make sure that you allocate enough partitions since a single S1 partition can handle 15M documents or 25GB so you will definitely need extra partitions for this service.

Also as another side note, when you are uploading your content (and especially if you choose to do parallelized uploads), keep an eye on the HTTP responses because if the search service exceeds the resources available you could get HTTP 207 (indicating one or more item failed to apply) or 503's indicating the whole batch failed due to throttling. If throttling occurs, you would want to back off a bit to let the service catch up.

I think you're reaching the request capacity:

https://azure.microsoft.com/en-us/documentation/articles/search-limits-quotas-capacity/

I would try another tier (s1, s2). If you still face the same problem, try get in touch with support team.

Another option:

Instead of pushing data, try to add your data to the blob storage, documentDb or Sql Database, and then, use the pull approach:

https://azure.microsoft.com/en-us/documentation/articles/search-howto-indexing-azure-blob-storage/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM