[英]How to efficiently store the text content in Elasticsearch and make it searchable
I have the data in the file in this form, how to split the different section and store it in Elasticsearch index and search based on some unique number. 我以这种形式在文件中存储了数据,如何拆分不同的部分并将其存储在Elasticsearch索引中,并根据一些唯一的数字进行搜索。
Sample data: 样本数据:
SSLEGGU00402-IM 13949 13949 58 1 285228 3094844 1U00402-IM 13949
200 1490 400 1490 600 1490 800 1490 1000 1490 2U00402-IM 13949
1200 1490 1400 1491 1600 1493 1800 1497 2000 1504 3U00402-IM 13949
SSLEGGU00412-IM 13885 13885 58 1 286359 3094844 1U00412-IM 13885
200 1489 400 1489 600 1489 800 1489 1000 1489 2U00412-IM 13885
1200 1489 1400 1490 1600 1493 1800 1497 2000 1505 3U00412-IM 13885
I would like to store SSLEGGU00402
as a separate document and SSLEGGU00412
as a separate document and i need to search based on the same. 我想将
SSLEGGU00402
存储为单独的文档,并将SSLEGGU00412
为单独的文档,我需要基于同一文档进行搜索。
Does Elasticsearch by-default gives some way to split this text and store it or we need to split it programmatically and store as Elasticsearch Index. 默认情况下,Elasticsearch是否提供某种方式来拆分此文本并将其存储,或者我们需要以编程方式对其进行拆分并存储为Elasticsearch Index。
A good start would be to look into 一个好的开始就是研究
elasticsearch 's
Ingest Node
and its Processors
. elasticsearch的
Ingest Node
及其Processors
。 They can be used to apply transformations on documents before they get indexed. 它们可用于在索引建立索引之前对文档应用转换。
If your source data is well-defined and adheres to a specific pattern, I hope you can make use of
GROK
processor to convert it to a structured JSON for indexing.如果您的源数据定义明确并遵循特定的模式,则希望您可以使用
GROK
处理器将其转换为结构化JSON以便建立索引。
https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html https://www.elastic.co/guide/zh-CN/elasticsearch/reference/current/grok-processor.html
In case if you have sophisticated logic that needs to be applied for data pre-processing, you can make your own pipeline 如果您有需要应用于数据预处理的复杂逻辑,则可以创建自己的管道
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/ingest.html https://www.elastic.co/guide/zh-CN/elasticsearch/reference/6.2/ingest.html
Thanks. 谢谢。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.