简体   繁体   English

如何在“检索”和“等级”上将大文档分解为较小的答案单位?

[英]How to break up large document into smaller answer units on Retrieve and Rank?

I am still very new to Retrieve and Rank, and Document Conversion services, so I have been playing around with that lately. 我对检索和排名以及文档转换服务还是很陌生,所以最近我一直在研究它。

I encountered a problem where when I upload a large document (100+ pages) - Retrieve and Rank would help me automatically break it up into answer units, which is great and helpful. 我上载大文档(超过100页)时遇到一个问题-检索和排名将帮助我自动将其分解为答案单位,这非常有用。

However, some questions only require ONE small line in the big chunks of answer units, is there a way that I can manually break further down the answer units that Retrieve and Rank service has provided me? 但是,有些问题只需要一大行答案单元中的一小行,是否有办法手动分解“检索和排名”服务提供给我的答案单元?

I heard that you can do it through JavaScript, but is there a way to do it through the UI? 听说您可以通过JavaScript进行操作,但是有没有办法通过UI进行操作呢?

I am contemplating to manually break up the huge doc into multiple smaller documents, but that could potentially lead to 100s of them - which is probably the last option that I'd resort to. 我正在考虑将庞大的文档手动分解为多个较小的文档,但这可能导致其中的数百个文档-这可能是我最后选择的方法。

Any help or suggestions is greatly appreciated! 任何帮助或建议,不胜感激!

Thank you all! 谢谢你们!

First off, one clarification: 首先,要澄清一下:

Retrieve and Rank does not break up your documents into answer units. 检索和排名不会将您的文档分解为答案单位。 That is something that the Document Conversion Service does when your conversion target is ANSWER_UNITS . 当转换目标为ANSWER_UNITS时,文档转换服务ANSWER_UNITS

Regarding your question: 关于您的问题:

I don't fully understand exactly what you're trying to do, but if the answer units that are produced by default don't meet your requirements, you can customize different steps of the conversion process to adjust the produced answer units. 我不完全了解您要执行的操作,但是如果默认生成的答案单位不符合您的要求,则可以自定义转换过程的不同步骤以调整生成的答案单位。 Take a look at the documentation here . 这里查看文档。

Specifically, you want to make sure that the heading levels (for Word, PDF or HTML, depending on your document type) are defined in a way that they detect the start of each answer unit. 具体来说,您要确保标题级别 (针对Word,PDF或HTML,取决于您的文档类型)的定义方式是,它们可以检测每个答案单元的开头。 Then, make sure that the heading levels that you defined (h1, h2, h3, etc.) are included in the selector_tags list within the answer_units section. 然后,确保您定义的标题级别(h1,h2,h3等)包含在answer_units部分的selector_tags answer_units列表中。

Once your custom Document Conversion Service configuration produces the answer units you are looking for, you will be ready to send them to Retrieve and Rank to be indexed. 一旦您的自定义文档转换服务配置生成了所需的答案单位,您就可以将其发送到“检索和排名”以进行索引了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM