簡體   English   中英

Elasticsearch無法返回某些文檔

[英]Elasticsearch fails to return some documents

我有此數據:

{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:16", "price": 59900, "sellerName": "Lelles MC AB", "description": "KTM 690 Duke (Abs) M\u00e4tarst\u00e4llning: 450 mil F\u00e4rg: Vit Typ: Touring/Landsv\u00e4g Info: Mycket fin Duke 690 med rensad bakdel och handskydd.", "location": "Uppsala", "id": 345, "title": "KTM 690 Duke (Abs)", "modelYear": 2016, "url": "https://www.blocket.se/uppsala/KTM_690_Duke__Abs__79079911.htm?ca=11&w=3", "vehicleType": "Touring"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:00", "price": 12900, "sellerName": "Hondo", "description": "Hej! D\u00e5 va det dags att s\u00e4lja p\u00e4rlan. Det som \u00e4r gjort med crossen \u00e4r Nytt bakd\u00e4ck. Nya bromsbel\u00e4gg bak. Nytt sadel\u00f6verdrag. Kolvbytet gjord f\u00f6r 25 timmar sen. Inga l\u00e4ckage. Extra k\u00e5pset ing\u00e5r. Vid en smidig aff\u00e4r s\u00e5 ing\u00e5r en haspl\u00e5t. Crossen startar alltid p\u00e5 f\u00f6rsta eller andra kicken. Vid mer info f\u00e5r ni g\u00e4rna ringa p\u00e5 telefon mvh", "location": "Uddevalla", "id": 319, "title": "Honda Cr 125", "modelYear": 2001, "url": "https://www.blocket.se/goteborg/Honda_Cr_125_79080992.htm?ca=11&w=3", "vehicleType": "Cross/enduro"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:15", "price": 22000, "sellerName": "Martin", "description": "G\u00e5tt - 2284mil.Haft sedan 2008 \u00e4r servad regelbunden p\u00e5 mc-firma. V\u00e4lsk\u00f6tt. Startar och g\u00e5r fint. Allt original. Vinterf\u00f6rvaring i garage. Besiktad senast maj -17. Ring eller maila", "location": "Norrk\u00f6ping", "id": 314, "title": "Honda VT 600C", "modelYear": 1999, "url": "https://www.blocket.se/ostergotland/Honda_VT_600C_79081306.htm?ca=11&w=3", "vehicleType": "Custom"}}

我正在運行此python代碼對其進行索引:

    def create_index(self, file_path):
        """
            Takes path to file containing JSON-formatted data
            and indexes into Elasticsearch index.
        """
        print('Creating index "{}"'.format(INDEX_NAME))

        request_body = {
"settings":{
    "index":{
        "number_of_shards":1,
        "number_of_replicas":0
    }
},
"mappings":{
    "motorcycle":{
        "properties":{
            "location": {
                "type":"text",
                "analyzer":"swedish"
            },
            "description":{
                "type":"text",
                "analyzer":"swedish"
            }
        }
    }
}
        }
        self.es.indices.create(index = INDEX_NAME, body = request_body)
        f_in = open(PATH_TO_DATASET, "r")
        actions = (json.loads(line) for line in f_in)
        print("Performed bulk index: {}".format(bulk(self.es, actions)))
        self.es.indices.refresh(index = "simple")

現在,我嘗試使用郵遞員查詢所有具有location:Uppsala文檔的索引(第一個對象的位置(我用python進行了相同的查詢,結果相同):

POST to localhost:9200/simple/_search:
{
    "query": {
        "bool": {
            "filter": [

                {
                    "term": {
                        "location": "uppsala"
                    }
                }
            ]
        }
    }
}

它什么也不返回。 如果將位置更改為uddevalla ,也會發生同樣的情況,該位置也位於原始數據(第二個文檔)中。

但是,如果我將location更改為norrköping ,它將返回它應該執行的第三份文檔。

這種不穩定行為的背后原因是什么?

更新:嘗試使用稍大的數據文件:

{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:16", "price": 59900, "sellerName": "Lelles MC AB", "description": "KTM 690 Duke (Abs) M\u00e4tarst\u00e4llning: 450 mil F\u00e4rg: Vit Typ: Touring/Landsv\u00e4g Info: Mycket fin Duke 690 med rensad bakdel och handskydd.", "location": "Uppsala", "id": 345, "title": "KTM 690 Duke (Abs)", "modelYear": 2016, "url": "https://www.blocket.se/uppsala/KTM_690_Duke__Abs__79079911.htm?ca=11&w=3", "vehicleType": "Touring"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:00", "price": 12900, "sellerName": "Hondo", "description": "Hej! D\u00e5 va det dags att s\u00e4lja p\u00e4rlan. Det som \u00e4r gjort med crossen \u00e4r Nytt bakd\u00e4ck. Nya bromsbel\u00e4gg bak. Nytt sadel\u00f6verdrag. Kolvbytet gjord f\u00f6r 25 timmar sen. Inga l\u00e4ckage. Extra k\u00e5pset ing\u00e5r. Vid en smidig aff\u00e4r s\u00e5 ing\u00e5r en haspl\u00e5t. Crossen startar alltid p\u00e5 f\u00f6rsta eller andra kicken. Vid mer info f\u00e5r ni g\u00e4rna ringa p\u00e5 telefon mvh", "location": "Uddevalla", "id": 319, "title": "Honda Cr 125", "modelYear": 2001, "url": "https://www.blocket.se/goteborg/Honda_Cr_125_79080992.htm?ca=11&w=3", "vehicleType": "Cross/enduro"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:15", "price": 22000, "sellerName": "Martin", "description": "G\u00e5tt - 2284mil.Haft sedan 2008 \u00e4r servad regelbunden p\u00e5 mc-firma. V\u00e4lsk\u00f6tt. Startar och g\u00e5r fint. Allt original. Vinterf\u00f6rvaring i garage. Besiktad senast maj -17. Ring eller maila", "location": "Norrk\u00f6ping", "id": 314, "title": "Honda VT 600C", "modelYear": 1999, "url": "https://www.blocket.se/ostergotland/Honda_VT_600C_79081306.htm?ca=11&w=3", "vehicleType": "Custom"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:57", "price": 11000, "sellerName": "Tommy Antfolk", "description": "Ett bra tillf\u00e4lle att skaffa hoj med h\u00e4rlig kombination av Touring och sport till ett \u00f6verkomligt pris. Suzuki gsx 750 F 1996/1997 Fint skick f\u00f6r sin \u00e5lder Bes till 30/9 2018 Ring f\u00f6r mer info", "location": "V\u00e4rmd\u00f6", "id": 322, "title": "Suzuki Gsx 750 F Sport Touring", "modelYear": 1996, "url": "https://www.blocket.se/stockholm/Suzuki_Gsx_750_F_Sport_Touring_79080891.htm?ca=11&w=3", "vehicleType": "Touring"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:51", "price": 15500, "sellerName": "Ulla Ottosson", "description": "Ett underbart fordon jag \u00e4gt sedan \u00e5r 2000, anv\u00e4nd mest till och fr\u00e5n jobbet. L\u00e4ttk\u00f6rd, l\u00e4ttstartad och nyservad. Har g\u00e5tt endast 1660 mil. Vid n\u00e4rmare 70 \u00e5rs \u00e5lder \u00e4r det dags att ta farv\u00e4l av ett s\u00e5dant fordon!", "location": "Karlstad", "id": 327, "title": "Suzuki Burgman 400 AN", "modelYear": 1999, "url": "https://www.blocket.se/varmland/Suzuki_Burgman_400_AN_79080774.htm?ca=11&w=3", "vehicleType": "Scooter"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:33", "price": 85000, "sellerName": "mikael", "description": "Super fin H-D svart matt lackerad i perfekt skick \u00e5rsmodell 1996. Ny servad och alla slitdelar bytta. Pedant sk\u00f6tt 3400 mil Fler bilder skickas p\u00e5 beg\u00e4ran", "location": "Helsingborg", "id": 334, "title": "Harley-Davidson FXDWG", "modelYear": 1996, "url": "https://www.blocket.se/helsingborg/Harley_Davidson_FXDWG_79080441.htm?ca=11&w=3", "vehicleType": "Touring"}}

查詢värmdö返回正確的文檔。 查詢karlstad不會返回任何內容(應該返回1次匹配)。 查詢“ Helsingborg”將返回正確的文檔。

更新2:

在任何查詢中似乎都不會出現的未顯示文檔。 例如,此查詢:

{
    "query": {
        "bool": {
            "filter": [],
            "must": {
                "multi_match": {
                    "fields": [
                        "title^1.0",
                        "description"
                    ],
                    "operator": "or",
                    "query": "suzuki",
                    "type": "cross_fields"
                }
            }
        }
    }
}

僅返回一個結果( location:Värmdö ),而實際上應返回兩個結果( location:Karlstad的結果未返回)。

將索引創建代碼更改為此可以解決問題。

    def create_index(self, file_path):
        """
            Takes path to file containing JSON-formatted data
            and indexes into Elasticsearch index.
        """
        print('Creating index "{}"'.format(INDEX_NAME))

        request_body = {
"settings":{
    "index":{
        "number_of_shards":1,
        "number_of_replicas":0
    }
},
"mappings":{
    "motorcycle":{
        "properties":{
            "location": {
                "type":"keyword",
            },
            "vehicleType": {
                "type": "keyword",
            },
            "description":{
                "type":"text",
                "analyzer":"swedish",
            },
        }
    }
}
        }
        self.es.indices.create(index = INDEX_NAME, body = request_body)
        f_in = open(PATH_TO_DATASET, "r")
        actions = (json.loads(line) for line in f_in)
        print("Performed bulk index: {}".format(bulk(self.es, actions)))
        self.es.indices.refresh(index = "simple")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM