简体   繁体   中英

Elasticsearch fails to return some documents

I have this data:

{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:16", "price": 59900, "sellerName": "Lelles MC AB", "description": "KTM 690 Duke (Abs) M\u00e4tarst\u00e4llning: 450 mil F\u00e4rg: Vit Typ: Touring/Landsv\u00e4g Info: Mycket fin Duke 690 med rensad bakdel och handskydd.", "location": "Uppsala", "id": 345, "title": "KTM 690 Duke (Abs)", "modelYear": 2016, "url": "https://www.blocket.se/uppsala/KTM_690_Duke__Abs__79079911.htm?ca=11&w=3", "vehicleType": "Touring"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:00", "price": 12900, "sellerName": "Hondo", "description": "Hej! D\u00e5 va det dags att s\u00e4lja p\u00e4rlan. Det som \u00e4r gjort med crossen \u00e4r Nytt bakd\u00e4ck. Nya bromsbel\u00e4gg bak. Nytt sadel\u00f6verdrag. Kolvbytet gjord f\u00f6r 25 timmar sen. Inga l\u00e4ckage. Extra k\u00e5pset ing\u00e5r. Vid en smidig aff\u00e4r s\u00e5 ing\u00e5r en haspl\u00e5t. Crossen startar alltid p\u00e5 f\u00f6rsta eller andra kicken. Vid mer info f\u00e5r ni g\u00e4rna ringa p\u00e5 telefon mvh", "location": "Uddevalla", "id": 319, "title": "Honda Cr 125", "modelYear": 2001, "url": "https://www.blocket.se/goteborg/Honda_Cr_125_79080992.htm?ca=11&w=3", "vehicleType": "Cross/enduro"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:15", "price": 22000, "sellerName": "Martin", "description": "G\u00e5tt - 2284mil.Haft sedan 2008 \u00e4r servad regelbunden p\u00e5 mc-firma. V\u00e4lsk\u00f6tt. Startar och g\u00e5r fint. Allt original. Vinterf\u00f6rvaring i garage. Besiktad senast maj -17. Ring eller maila", "location": "Norrk\u00f6ping", "id": 314, "title": "Honda VT 600C", "modelYear": 1999, "url": "https://www.blocket.se/ostergotland/Honda_VT_600C_79081306.htm?ca=11&w=3", "vehicleType": "Custom"}}

and I'm running this python code to index it:

    def create_index(self, file_path):
        """
            Takes path to file containing JSON-formatted data
            and indexes into Elasticsearch index.
        """
        print('Creating index "{}"'.format(INDEX_NAME))

        request_body = {
"settings":{
    "index":{
        "number_of_shards":1,
        "number_of_replicas":0
    }
},
"mappings":{
    "motorcycle":{
        "properties":{
            "location": {
                "type":"text",
                "analyzer":"swedish"
            },
            "description":{
                "type":"text",
                "analyzer":"swedish"
            }
        }
    }
}
        }
        self.es.indices.create(index = INDEX_NAME, body = request_body)
        f_in = open(PATH_TO_DATASET, "r")
        actions = (json.loads(line) for line in f_in)
        print("Performed bulk index: {}".format(bulk(self.es, actions)))
        self.es.indices.refresh(index = "simple")

Now, I'm trying to query the index using postman for all documents with location:Uppsala (the location of the first object (I did the same query with python with the same result):

POST to localhost:9200/simple/_search:
{
    "query": {
        "bool": {
            "filter": [

                {
                    "term": {
                        "location": "uppsala"
                    }
                }
            ]
        }
    }
}

It returns nothing. The same thing happens if I change the location to uddevalla , which is also in the original data (second document).

However, if I change location to norrköping , it returns the third document, which it should do.

What is the reason behind this erratic behaviour?

UPDATE: Tried with a slightly larger data file:

{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:16", "price": 59900, "sellerName": "Lelles MC AB", "description": "KTM 690 Duke (Abs) M\u00e4tarst\u00e4llning: 450 mil F\u00e4rg: Vit Typ: Touring/Landsv\u00e4g Info: Mycket fin Duke 690 med rensad bakdel och handskydd.", "location": "Uppsala", "id": 345, "title": "KTM 690 Duke (Abs)", "modelYear": 2016, "url": "https://www.blocket.se/uppsala/KTM_690_Duke__Abs__79079911.htm?ca=11&w=3", "vehicleType": "Touring"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:00", "price": 12900, "sellerName": "Hondo", "description": "Hej! D\u00e5 va det dags att s\u00e4lja p\u00e4rlan. Det som \u00e4r gjort med crossen \u00e4r Nytt bakd\u00e4ck. Nya bromsbel\u00e4gg bak. Nytt sadel\u00f6verdrag. Kolvbytet gjord f\u00f6r 25 timmar sen. Inga l\u00e4ckage. Extra k\u00e5pset ing\u00e5r. Vid en smidig aff\u00e4r s\u00e5 ing\u00e5r en haspl\u00e5t. Crossen startar alltid p\u00e5 f\u00f6rsta eller andra kicken. Vid mer info f\u00e5r ni g\u00e4rna ringa p\u00e5 telefon mvh", "location": "Uddevalla", "id": 319, "title": "Honda Cr 125", "modelYear": 2001, "url": "https://www.blocket.se/goteborg/Honda_Cr_125_79080992.htm?ca=11&w=3", "vehicleType": "Cross/enduro"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:15", "price": 22000, "sellerName": "Martin", "description": "G\u00e5tt - 2284mil.Haft sedan 2008 \u00e4r servad regelbunden p\u00e5 mc-firma. V\u00e4lsk\u00f6tt. Startar och g\u00e5r fint. Allt original. Vinterf\u00f6rvaring i garage. Besiktad senast maj -17. Ring eller maila", "location": "Norrk\u00f6ping", "id": 314, "title": "Honda VT 600C", "modelYear": 1999, "url": "https://www.blocket.se/ostergotland/Honda_VT_600C_79081306.htm?ca=11&w=3", "vehicleType": "Custom"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:57", "price": 11000, "sellerName": "Tommy Antfolk", "description": "Ett bra tillf\u00e4lle att skaffa hoj med h\u00e4rlig kombination av Touring och sport till ett \u00f6verkomligt pris. Suzuki gsx 750 F 1996/1997 Fint skick f\u00f6r sin \u00e5lder Bes till 30/9 2018 Ring f\u00f6r mer info", "location": "V\u00e4rmd\u00f6", "id": 322, "title": "Suzuki Gsx 750 F Sport Touring", "modelYear": 1996, "url": "https://www.blocket.se/stockholm/Suzuki_Gsx_750_F_Sport_Touring_79080891.htm?ca=11&w=3", "vehicleType": "Touring"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:51", "price": 15500, "sellerName": "Ulla Ottosson", "description": "Ett underbart fordon jag \u00e4gt sedan \u00e5r 2000, anv\u00e4nd mest till och fr\u00e5n jobbet. L\u00e4ttk\u00f6rd, l\u00e4ttstartad och nyservad. Har g\u00e5tt endast 1660 mil. Vid n\u00e4rmare 70 \u00e5rs \u00e5lder \u00e4r det dags att ta farv\u00e4l av ett s\u00e5dant fordon!", "location": "Karlstad", "id": 327, "title": "Suzuki Burgman 400 AN", "modelYear": 1999, "url": "https://www.blocket.se/varmland/Suzuki_Burgman_400_AN_79080774.htm?ca=11&w=3", "vehicleType": "Scooter"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:33", "price": 85000, "sellerName": "mikael", "description": "Super fin H-D svart matt lackerad i perfekt skick \u00e5rsmodell 1996. Ny servad och alla slitdelar bytta. Pedant sk\u00f6tt 3400 mil Fler bilder skickas p\u00e5 beg\u00e4ran", "location": "Helsingborg", "id": 334, "title": "Harley-Davidson FXDWG", "modelYear": 1996, "url": "https://www.blocket.se/helsingborg/Harley_Davidson_FXDWG_79080441.htm?ca=11&w=3", "vehicleType": "Touring"}}

Querying for värmdö returns the right document. Querying for karlstad returns nothing (should return 1 hit). Querying for "helsingborg" returns the right document.

UPDATE 2:

The documents that don't show up when they should seem to not show up for any query. For example, this query:

{
    "query": {
        "bool": {
            "filter": [],
            "must": {
                "multi_match": {
                    "fields": [
                        "title^1.0",
                        "description"
                    ],
                    "operator": "or",
                    "query": "suzuki",
                    "type": "cross_fields"
                }
            }
        }
    }
}

only returns one result, (the one with location:Värmdö ), while it should in fact return two (the one with location:Karlstad isn't returned).

Changing the index creation code to this solved the problem.

    def create_index(self, file_path):
        """
            Takes path to file containing JSON-formatted data
            and indexes into Elasticsearch index.
        """
        print('Creating index "{}"'.format(INDEX_NAME))

        request_body = {
"settings":{
    "index":{
        "number_of_shards":1,
        "number_of_replicas":0
    }
},
"mappings":{
    "motorcycle":{
        "properties":{
            "location": {
                "type":"keyword",
            },
            "vehicleType": {
                "type": "keyword",
            },
            "description":{
                "type":"text",
                "analyzer":"swedish",
            },
        }
    }
}
        }
        self.es.indices.create(index = INDEX_NAME, body = request_body)
        f_in = open(PATH_TO_DATASET, "r")
        actions = (json.loads(line) for line in f_in)
        print("Performed bulk index: {}".format(bulk(self.es, actions)))
        self.es.indices.refresh(index = "simple")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM