使用附加字符扩展 Elasticsearch 的标准分析器以进行标记

Question

我基本上想要内置标准分析器的功能，它可以额外标记下划线。

目前，标准分析器会将 brown_fox_has 保留为单一标记，但我想要 [brown, fox, has] 代替。 简单的分析器比标准分析器失去了一些功能，所以我想尽可能地保持标准。

该文档仅显示了如何添加过滤器和其他非标记器更改，但我想保留所有标准标记器，同时添加额外的下划线。

我可以创建一个字符过滤器来将_映射到-标准分词器将为我完成这项工作，但是有更好的方法吗？

            es.indices.create(index="mine", body={
                "settings": {
                    "analysis": {
                        "analyzer": {
                            "default": {
                                "type": "custom",
                                # "tokenize_on_chars": ["_"],  # i want this to work with the standard tokenizer without using char group
                                "tokenizer": "standard",
                                "filter": ["lowercase"]
                            }
                        }
                    },
                }
            })
            res = es.indices.analyze(index="mine", body={
                "field": "text",
                "text": "the quick brown_fox_has to be split"
            })

Answer 1

使用normalizer并将其与您首选的标准分词器一起定义

POST /_analyze

{
  "char_filter": {
      "type": "mapping",
      "mappings": [
          "_ =>\\u0020" // replace underscore with whitespace
      ]
  },
  "tokenizer": "standard",
  "text": "the quick brown_fox_has to be split"
}

使用附加字符扩展 Elasticsearch 的标准分析器以进行标记

问题描述

1 个解决方案

解决方案1
0 2021-12-15 14:17:35

使用附加字符扩展 Elasticsearch 的标准分析器以进行标记

问题描述

1 个解决方案

解决方案1 0 2021-12-15 14:17:35

解决方案1
0 2021-12-15 14:17:35