简体   繁体   English

弹性搜索排序字段,其中包含特殊字符数字和字母缩写

[英]Elastic search sort field containing special characters numbers and alpahbets

I created a case insensitive analyzer as 我创建了一个不区分大小写的分析器

PUT /dhruv3
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "analyzer_keyword": {
            "tokenizer": "keyword",
            "filter": [ "lowercase", "asciifolding" ]
          }
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "about": {
          "type": "string",
          "analyzer": "analyzer_keyword"
        },
        "firsName": {
          "type": "string"
        }
      }
    }
  }
}

and used it in mapping. 并在映射中使用它。 About field is supposed to contain aplha numerc and special characters.Then I inserted some values with about field as About字段应该包含aplha numerc和特殊字符。然后我用about字段插入一些值作为

1234, `pal, pal, ~pal 1234,`pal,pal,〜pal

. Besides searching I need to get result sorted. 除了搜索,我还需要对结果进行排序。 Searching is working well but when I try to sort them as 搜索效果很好,但是当我尝试将其排序为

GET dhruv/test/_search
{
  "sort": [
    {
      "about": {
        "order": "asc"
      }
    }
  ]
}

I get results in about field as 我在大约字段中得到结果

1234,`pal,pal,~pal 1234,`PAL,PAL,PAL〜

. But I expect them to be as first special characters, then numbers and then alphabets. 但我希望它们将作为第一个特殊字符,然后是数字,然后是字母。

I did some home work and came to know that its because of their ASCII values. 我做了一些家庭作业,并由于其ASCII值而知道了这一点。 SO i searched internet and tried even asciifolding . 所以我搜寻了互联网并且尝试了asciifolding But didn't work out. 但是没有解决。 I know there is some solution some where but I can't figure out. 我知道某些地方有解决方案,但我不知道。 Please guide me 请指导我

You're right in that the sorting behavior you are seeing is due to the ASCII value of the special characters to be bigger than the ASCII value of digits. 正确的是,您看到的排序行为是由于特殊字符的ASCII值大于数字的ASCII值。 To be precise, looking at the ASCII table , we have the following values: 确切地说,查看ASCII表 ,我们有以下值:

  • 1 has the ASCII value 49 1的ASCII值49
  • ` has the ASCII value 96 `具有ASCII值96
  • p has the ASCII value 112 p具有ASCII值112
  • ~ has the ASCII value 126 ~具有ASCII值126

The asciifolding token filter simply transforms characters and digits which are NOT in the ASCII table (ie first 127 characters) into their ASCII equivalent, if such one exists (eg é , è , ë , ê are transformed to e ). 如果存在这样的字符(例如éèëê转换成e ),则asciifolding令牌过滤器asciifolding ASCII表中存在的字符和数字(即前127个字符)简单地转换为ASCII等价物。 Since all the characters above are in the ASCII table, this is not what you're looking for. 由于上面的所有字符都在ASCII表中,因此这不是您要查找的内容。

If you want the special characters to come up first in the search there are several ways. 如果您希望特殊字符在搜索中首先出现,有几种方法。

One way to achieve it is simply to negate their ASCII value so that they will always come before the ASCII 0 character and then use script sorting: 一种实现方法是简单地取反其ASCII值,以使它们始终位于ASCII 0字符之前,然后使用脚本排序:

{
  "sort": [
    {
      "_script": {
        "script": "return doc['about'].value.chars[0].isLetterOrDigit() ? 1 : -1",
        "type": "number",
        "order": "asc"
      }
    }
  ]
}

The asciifolding has nothing to do with what you're trying to achieve. asciifolding与您要实现的目标无关。 The ASCIIFoldingFilter.java has a wealth of information, it merely decodes unicode chars like \~ to its ASCII equivalent in case if one can be provided as the alternative. ASCIIFoldingFilter.java具有大量信息,它仅将\~类的Unicode字符解码为等效的ASCII字符(如果可以提供)。

Adding to @Val's answer, in case you want the values sorted in the order of special chars then numbers then alphabets, you may want to consider using - 添加到@Val的答案中,如果您希望这些值按特殊字符,数字和字母的顺序排序,则可能需要考虑使用-

GET /ascii/test/_search
{
  "sort": {
    "_script": {
      "script": "r = doc['about'].value.chars[0]; return !r.isLetter() ? r.isDigit() ? 1 : -1 : 2",
      "type": "number",
      "order": "asc"
    }
  }
}

Also, note this sorting may not be perfect since we only took care of first char in the script. 另外,请注意这种排序可能并不完美,因为我们只处理了脚本中的第一个字符。 You may want to write a robust script that takes care of entire value. 您可能需要编写一个健壮的脚本来照顾整个价值。

This gist is a good example of what you can achieve using embedded scripts. 要点是使用嵌入式脚本可以实现的一个很好的例子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM