简体   繁体   English

如何在Elasticsearch中有效存储文本内容并使之可搜索

[英]How to efficiently store the text content in Elasticsearch and make it searchable

I have the data in the file in this form, how to split the different section and store it in Elasticsearch index and search based on some unique number. 我以这种形式在文件中存储了数据,如何拆分不同的部分并将其存储在Elasticsearch索引中,并根据一些唯一的数字进行搜索。

Sample data: 样本数据:

SSLEGGU00402-IM    13949 13949    58     1  285228   3094844 1U00402-IM    13949
   200  1490   400  1490   600  1490   800  1490  1000  1490 2U00402-IM    13949
  1200  1490  1400  1491  1600  1493  1800  1497  2000  1504 3U00402-IM    13949
SSLEGGU00412-IM    13885 13885    58     1  286359   3094844 1U00412-IM    13885
   200  1489   400  1489   600  1489   800  1489  1000  1489 2U00412-IM    13885
  1200  1489  1400  1490  1600  1493  1800  1497  2000  1505 3U00412-IM    13885

I would like to store SSLEGGU00402 as a separate document and SSLEGGU00412 as a separate document and i need to search based on the same. 我想将SSLEGGU00402存储为单独的文档,并将SSLEGGU00412为单独的文档,我需要基于同一文档进行搜索。

Does Elasticsearch by-default gives some way to split this text and store it or we need to split it programmatically and store as Elasticsearch Index. 默认情况下,Elasticsearch是否提供某种方式来拆分此文本并将其存储,或者我们需要以编程方式对其进行拆分并存储为Elasticsearch Index。

A good start would be to look into 一个好的开始就是研究 's Ingest Node and its Processors . Ingest Node及其Processors They can be used to apply transformations on documents before they get indexed. 它们可用于在索引建立索引之前对文档应用转换。

If your source data is well-defined and adheres to a specific pattern, I hope you can make use of GROK processor to convert it to a structured JSON for indexing. 如果您的源数据定义明确并遵循特定的模式,则希望您可以使用GROK处理器将其转换为结构化JSON以便建立索引。

https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html https://www.elastic.co/guide/zh-CN/elasticsearch/reference/current/grok-processor.html

In case if you have sophisticated logic that needs to be applied for data pre-processing, you can make your own pipeline 如果您有需要应用于数据预处理的复杂逻辑,则可以创建自己的管道

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/ingest.html https://www.elastic.co/guide/zh-CN/elasticsearch/reference/6.2/ingest.html

Thanks. 谢谢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何链接扫描的文档及其文本内容以使其可搜索? - How to link scanned document with its text content to make it searchable? 如何有效地使大型XML文件在Web应用程序中可搜索? - How to efficiently make a large XML file searchable in a web application? 如何执行:上载图像>识别文本>使图像可搜索>存储到数据库中? - How to perform: Upload Image > Recognize Text > Make Image Searchable > Store into DB? ElasticSearch 使字段不可从 Java 搜索 - ElasticSearch Make Field non-searchable from java 如何使用任何 Java 库使现有 PDF 文本可搜索? 使用 OCR - How to Make Existing PDF Text Searchable using any Java Library? With OCR 使用python或java向pdf添加不可见文本以制作可搜索的pdf - Add invisible text to pdf using python or java to make searchable pdf 如何有效地存储临时用户数据 - How to efficiently store temporary userdata 如何在Android上有效存储位集 - How to efficiently store bitsets on Android 如何使用Java和Hibernate使用byte []使可搜索的String ID - how to make a searchable String ID with a byte[] using java and hibernate 如何有效地读取Java中的巨大文本文件并拆分其内容以对其进行排序? - How to read efficiently a huge text file in java and split its content to sort it?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM