简体   繁体   中英

How to break up large document into smaller answer units on Retrieve and Rank?

I am still very new to Retrieve and Rank, and Document Conversion services, so I have been playing around with that lately.

I encountered a problem where when I upload a large document (100+ pages) - Retrieve and Rank would help me automatically break it up into answer units, which is great and helpful.

However, some questions only require ONE small line in the big chunks of answer units, is there a way that I can manually break further down the answer units that Retrieve and Rank service has provided me?

I heard that you can do it through JavaScript, but is there a way to do it through the UI?

I am contemplating to manually break up the huge doc into multiple smaller documents, but that could potentially lead to 100s of them - which is probably the last option that I'd resort to.

Any help or suggestions is greatly appreciated!

Thank you all!

First off, one clarification:

Retrieve and Rank does not break up your documents into answer units. That is something that the Document Conversion Service does when your conversion target is ANSWER_UNITS .

Regarding your question:

I don't fully understand exactly what you're trying to do, but if the answer units that are produced by default don't meet your requirements, you can customize different steps of the conversion process to adjust the produced answer units. Take a look at the documentation here .

Specifically, you want to make sure that the heading levels (for Word, PDF or HTML, depending on your document type) are defined in a way that they detect the start of each answer unit. Then, make sure that the heading levels that you defined (h1, h2, h3, etc.) are included in the selector_tags list within the answer_units section.

Once your custom Document Conversion Service configuration produces the answer units you are looking for, you will be ready to send them to Retrieve and Rank to be indexed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM