简体   繁体   English

如何在不使用Base64的情况下在ElasticSearch中索引二进制文件

[英]How to index binary file in ElasticSearch without using Base64

I'm using the NodeJS elasticsearch package to interact with ElasticSearch. 我使用的NodeJS elasticsearch包与ElasticSearch互动。 I have a document that has a file field. 我有一个包含file字段的文档。 I want to be able to upload a file to the index but the only way that I have found is by using the elasticsearch-mapper-attachment plugin. 我希望能够将文件上传到索引,但是我发现的唯一方法是使用elasticsearch-mapper-attachment插件。

The problem is that if I use it, I have to load the whole file in memory, encode it to Base64 and then pass the String to ElasticSearch. 问题是,如果使用它,则必须将整个文件加载到内存中,将其编码为Base64,然后将String传递给ElasticSearch。

I'd like to be able to pass a Stream to ElasticSearch (referencing any binary file: pdf, xls, doc, ppt). 我希望能够将Stream传递给ElasticSearch(引用任何二进制文件:pdf,xls,doc,ppt)。

The elasticsearch-mapper-attachment plugin parses the uploaded binary file and extracts text for further indexing using built-in Tika extractor. elasticsearch-mapper-attachment插件会解析上载的二进制文件,并使用内置的Tika提取器提取文本以进行进一步索引。

What some applications do (for example Search Technology's Aspire) - they run binaries thru Tika locally, extract text and upload just that text with the documents to index. 某些应用程序的工作(例如Search Technology的Aspire)-它们通过Tika在本地运行二进制文件,提取文本,然后仅将文本与文档一起上载以建立索引。

It might not be the answer you are looking for but you really have just two options - use Elastic plugin (and convert the binary to base64 in yoru code prior to uploading the document to elastic), or parse the binary and extract text in your code and then upload just that text to elastic. 它可能不是您要寻找的答案,但实际上只有两个选择-使用Elastic插件(在将文档上传到Elastic之前,先在yoru代码中将二进制文件转换为base64),或者解析二进制文件并提取代码中的文本然后仅将文本上传到弹性文件。 Former is easier, latter gives you more control over the process 前者更容易,后者使您可以更好地控制流程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM