简体   繁体   English

如何处理 Elasticsearch 索引中的空值

[英]How to handle nulls in an Elasticsearch index

I have a SQL table that I am exporting to Elasticsearch.我有一个要导出到 Elasticsearch 的 SQL 表。

One of the columns is a numeric field that is nullable, with nulls in some of the records.其中一列是可为空的数字字段,其中一些记录为空。

When we try to index the table, we get this error:当我们尝试索引表时,我们得到这个错误:

One of the ETL (BigQuery -> ElasticSearch) jobs for Table : MLS has been ES Failed Chunk of 10000 from index 20000 possibly due to incompatible objects. Table 的 ETL(BigQuery -> ElasticSearch)作业之一:MLS 可能是由于对象不兼容而导致索引 20000 的 ES Failed Chunk of 10000。

 Failing BigQuery Table: MLS Stack Trace of the error: Traceback (most recent call last): File "/Users/asif/zodiacbackend/zodiacbackend/tasks.py", line 205, in

insertIntoES helpers.bulk(es, doc_generator(dataframe,table)) File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 300, in bulk for ok, item in streaming_bulk(client, actions, *args, **kwargs): File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 230, in streaming_bulk **kwargs File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 158, in _process_bulk_chunk raise BulkIndexError("%i document(s) failed to index." % len(errors), errors) elasticsearch.helpers.errors.BulkIndexError: ('2 document(s) failed to index.', [{'index': {'_index': 'mls', '_type': 'mls', '_id': 'b100qHABEFI45Lp-z3Om', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'mapper [Lot_Size_Sq_Ft] of different type, current_type [text], merged_type [long]'}, 'data': { 'Lot_Size_Sq_Ft': Decimal('13504')}}}]) insertIntoES helpers.bulk(es, doc_generator(dataframe,table)) 文件“/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py”,第 300 行,批量用于好的,streaming_bulk 中的项目(客户端、操作、*args、**kwargs):文件“/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py”,第 230 行,在 streaming_bulk **kwargs 文件“/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py”中,第 158 行,在 _process_bulk_chunk raise BulkIndexError("%i document(s ) 索引失败。" % len(errors), errors) elasticsearch.helpers.errors.BulkIndexError: ('2 document(s) failed to index.', [{'index': {'_index': 'mls', '_type': 'mls', '_id': 'b100qHABEFI45Lp-z3Om', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'mapper [Lot_Size_Sq_Ft] 不同类型, current_type [文本],merged_type [long]'},'数据':{'Lot_Size_Sq_Ft':十进制('13504')}}}])

How do I get the system to recognize nulls?如何让系统识别空值?

User WittyID, missed some important things like:用户 WittyID,错过了一些重要的事情,例如:

  1. values of null_value must be of the same data-type, of your field, so in his example, he declared integer field but defined NULL as null_values , would throw json_parse_exception and this is mentioned as important in the official link as below: null_value的值必须与您的字段具有相同的数据类型,因此在他的示例中,他声明了integer字段,但将NULL定义为null_values ,会抛出json_parse_exception ,这在以下官方链接中被提及为important

The null_value needs to be the same data type as the field. null_value 需要与字段的数据类型相同。 For instance, a long field cannot have a string null_value.例如,长字段不能有字符串 null_value。

  1. The null_value only influences how data is indexed, it doesn't modify the _source document, so in your source document, whatever you passed, will be stored, not the one mentioned in null_values param and at query time also you need to use the value null_value param. null_value仅影响数据的索引方式,它不会修改 _source 文档,因此在您的源文档中,无论您传递什么,都将被存储,而不是null_values参数中提到的那个,并且在查询时您也需要使用该值null_value参数。 . .

In short, null isn't recognized in ES, hence you can define your custom values for null and then use it to index and query the null values.It's easy to explain the entire thing using the below example, which anybody can try:简而言之,在 ES 中无法识别null ,因此您可以为null定义自定义值,然后使用它来索引和查询null 。使用以下示例很容易解释整个事情,任何人都可以尝试:

Create index创建索引

{
  "mappings": {
    "properties": {
      "my_signed_integer": {
        "type":"integer",
        "null_value": -1 --> note we defining `null` values as `-1`.
      }
    }
  }
}

Index doc索引文件

  1. store null integer docs存储null整数文档

    { "my_number" : null } {“我的号码”:空}

If you get this doc from ES it would e returned as below:如果你从 ES 得到这个文档,它会返回如下:

{
   "_index": "so-6053847",
   "_type": "_doc",
   "_id": "1",
   "_version": 1,
   "_seq_no": 0,
   "_primary_term": 1,
   "found": true,
   "_source": {
      "my_number": null. --> As explained earlier, in source its stored as `null`.
   }
}
  1. Index non-negative value索引非负值

    { "my_number" : 10 } {“我的号码”:10}

Search query to fetch integer which had null values搜索查询以获取具有null值的整数

{
  "query": {
    "term": {
      "my_signed_integer": -1 -->notice same `null_value`, you need to mention
    }
  }
}

Result:结果:

 "hits": [
         {
            "_index": "so-6053847",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
               "my_signed_integer": null --> notice it shows `null`, not `-1`
            }
         }
      ]

Search query for other numbers(not null) ie in our case 10搜索查询其他数字(非空),即在我们的案例中10

{
  "query": {
    "term": {
      "my_signed_integer": 10
    }
  }
}

Result结果

"hits": [
         {
            "_index": "so-6053847",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
               "my_signed_integer": 10 -->source matches the indexed value for this doc
            }
         }
      ]

You're dealing with a common ES head scratcher.您正在处理一个常见的 ES 头部刮刀。 Elasticsearch doesn't index null values (not just the numeric nulls). Elasticsearch 不索引空值(不仅仅是数字空值)。 You need to specify in your index mapping how you want any detected null values to be indexed.您需要在索引映射中指定您希望如何为任何检测到的空值建立索引。 Something like this:像这样的东西:

  "mappings": {
    "properties": {
      "nullable_numeric": {
        "type":       "integer",
        "null_value": -1 
      },
      "nullable_text": {
       "type":        "text",
       "null_value":  "NULL"
    }
  }

Once you do this, ES would know how to properly index those fields.一旦你这样做了,ES 就会知道如何正确索引这些字段。 Note, you don't need to change your raw data, just let ES know how to index nulls for search....which by the way, won't affect the docs when you query ES.请注意,您不需要更改原始数据,只需让 ES 知道如何索引空值以进行搜索....顺便说一下,当您查询 ES 时不会影响文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM