简体   繁体   中英

How to search through data with arbitrary amount of fields?

I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.

Created form is used for:

  • register a new member to event;
  • search through registered members.

What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?

ElasticSearch automatically detects the field content in order to index it correctly, even if the mapping hasn't been defined previously. So, yes : ElasticSearch suits well these cases.

However, you may want to fine tune this behavior, or maybe the default mapping applied by ElasticSearch doesn't correspond to what you need : in this case, take a look at the default mapping or, for even further control, the dynamic templates feature.

If you let your end users decide the keys you store things in, you'll have an ever-growing mapping and cluster state, which is problematic.

This case and a suggested solution is covered in this article on common problems with Elasticsearch .

Essentially, you want to have everything that can possibly be user-defined as a value. Using nested documents, you can have a key -field and differently mapped value fields to achieve pretty much the same.

I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.

The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/

In short, you will need to do the following steps to get what you want:

  1. Create a special index described in the post.
  2. Flatten the data you want to index using the flattenData function:
    https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4 .
  3. Create a document with the original and flattened data and index it into Elasticsearch:

     { "data": { ... }, "flatData": [ ... ] } 
  4. Optional: use Elasticsearch aggregations to find which fields and types have been indexed.

  5. Execute queries on the flatData object to find what you need.

Example

Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:

  • name string
  • age long
  • sex long - 0 for male, 1 for female

In addition to this data, the related event probably has some sort of id, let's call it eventId . So the final document could look like this:

{
    "eventId": "2T73ZT1R463DJNWE36IA8FEN",
    "name": "Bob",
    "age": 22,
    "sex": 0
}

Now, before we index this document, we will flatten it using the flattenData function:

flattenData(document);

This will produce the following array:

[
    {
        "key": "eventId",
        "type": "string",
        "key_type": "eventId.string",
        "value_string": "2T73ZT1R463DJNWE36IA8FEN"
    },
    {
        "key": "name",
        "type": "string",
        "key_type": "name.string",
        "value_string": "Bob"
    },
    {
        "key": "age",
        "type": "long",
        "key_type": "age.long",
        "value_long": 22
    },
    {
        "key": "sex",
        "type": "long",
        "key_type": "sex.long",
        "value_long": 0
    }
]

Then we will wrap this data in a document as I've showed before and index it.

Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:

  • name string
  • city string
  • sex string - "male" or "female"

This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".

Let's try to flatten the data submitted by this form:

flattenData({
    "eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
    "name": "Alice",
    "city": "New York",
    "sex": "female"
});

This will produce the following data:

[
    {
        "key": "eventId",
        "type": "string",
        "key_type": "eventId.string",
        "value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
    },
    {
        "key": "name",
        "type": "string",
        "key_type": "name.string",
        "value_string": "Alice"
    },
    {
        "key": "city",
        "type": "string",
        "key_type": "city.string",
        "value_string": "New York"
    },
    {
        "key": "sex",
        "type": "string",
        "key_type": "sex.string",
        "value_string": "female"
    }
]

Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.

For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:

{
    "query": {
        "bool": {
            "must": [
                {
                    "nested": {
                        "path": "flatData",
                        "query": {
                            "bool": {
                                "must": [
                                    {"term": {"flatData.key": "eventId"}},
                                    {"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
                                ]
                            }
                        }
                    }
                },
                {
                    "nested": {
                        "path": "flatData",
                        "query": {
                            "bool": {
                                "must": [
                                    {"term": {"flatData.key": "name"}},
                                    {"match": {"flatData.value_string": "bob"}}
                                ]
                            }
                        }
                    }
                }
            ]
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM