简体   繁体   English

如何使用 ElasticSearch 索引源代码

[英]How to index source code with ElasticSearch

I need to provide full text search on javascript source files and highlighting of results.我需要对 javascript 源文件提供全文搜索并突出显示结果。

My question is what combination of existing ElasticSearch tokenizers and analyzers would be best for this?我的问题是现有的 ElasticSearch 标记器和分析器的哪种组合最适合这个?

Interesting question but I'm not aware of an out of the box solution.有趣的问题,但我不知道开箱即用的解决方案。 You can use a WordDelimiter tokenizer as you can specify eg the underscore to be handled as a digit and then functions like hello_world (or helloWorld if camelcase is enabled) will be searchable via hello or world.您可以使用 WordDelimiter 分词器,因为您可以指定例如下划线作为数字处理,然后可以通过 hello 或 world 搜索 hello_world(或 helloWorld,如果启用了驼峰式大小写)之类的函数。

But I doubt that the results are sufficient ... and you'll have to implement a source code analyzer yourself or use code which extracts the syntax tree to index method names and bodies into different fields但我怀疑结果是否足够......您必须自己实现源代码分析器或使用提取语法树的代码将方法名称和主体索引到不同的字段

You can use the attachment type plugin to load the files into Elasticsearch and let it index the files.您可以使用附件类型插件将文件加载到 Elasticsearch 并让它索引文件。 It can handle meta data for the files and index the content of the files.它可以处理文件的元数据并索引文件的内容。

The github page includes information on how to do highlighting of the search documents. github 页面包含有关如何突出显示搜索文档的信息。

Unless you want to expose this as a service to somebody, i would recommend you to install InstaSearch plugin in eclipse;除非您想将此作为服务公开给某人,否则我建议您在 eclipse 中安装InstaSearch插件; this plugin creates lucense index and gives you instantaneous results.此插件创建 lucense 索引并为您提供即时结果。

This kind of indexing feature is part of the ElasticSearch configuration for MS Azure DevOps Server.这种索引功能是 MS Azure DevOps Server 的 ElasticSearch 配置的一部分。 Although, I haven't a clue about how it's done :/虽然,我不知道它是如何完成的:/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM