繁体   English   中英

Elasticsearch无法返回某些文档

[英]Elasticsearch fails to return some documents

我有此数据:

{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:16", "price": 59900, "sellerName": "Lelles MC AB", "description": "KTM 690 Duke (Abs) M\u00e4tarst\u00e4llning: 450 mil F\u00e4rg: Vit Typ: Touring/Landsv\u00e4g Info: Mycket fin Duke 690 med rensad bakdel och handskydd.", "location": "Uppsala", "id": 345, "title": "KTM 690 Duke (Abs)", "modelYear": 2016, "url": "https://www.blocket.se/uppsala/KTM_690_Duke__Abs__79079911.htm?ca=11&w=3", "vehicleType": "Touring"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:00", "price": 12900, "sellerName": "Hondo", "description": "Hej! D\u00e5 va det dags att s\u00e4lja p\u00e4rlan. Det som \u00e4r gjort med crossen \u00e4r Nytt bakd\u00e4ck. Nya bromsbel\u00e4gg bak. Nytt sadel\u00f6verdrag. Kolvbytet gjord f\u00f6r 25 timmar sen. Inga l\u00e4ckage. Extra k\u00e5pset ing\u00e5r. Vid en smidig aff\u00e4r s\u00e5 ing\u00e5r en haspl\u00e5t. Crossen startar alltid p\u00e5 f\u00f6rsta eller andra kicken. Vid mer info f\u00e5r ni g\u00e4rna ringa p\u00e5 telefon mvh", "location": "Uddevalla", "id": 319, "title": "Honda Cr 125", "modelYear": 2001, "url": "https://www.blocket.se/goteborg/Honda_Cr_125_79080992.htm?ca=11&w=3", "vehicleType": "Cross/enduro"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:15", "price": 22000, "sellerName": "Martin", "description": "G\u00e5tt - 2284mil.Haft sedan 2008 \u00e4r servad regelbunden p\u00e5 mc-firma. V\u00e4lsk\u00f6tt. Startar och g\u00e5r fint. Allt original. Vinterf\u00f6rvaring i garage. Besiktad senast maj -17. Ring eller maila", "location": "Norrk\u00f6ping", "id": 314, "title": "Honda VT 600C", "modelYear": 1999, "url": "https://www.blocket.se/ostergotland/Honda_VT_600C_79081306.htm?ca=11&w=3", "vehicleType": "Custom"}}

我正在运行此python代码对其进行索引:

    def create_index(self, file_path):
        """
            Takes path to file containing JSON-formatted data
            and indexes into Elasticsearch index.
        """
        print('Creating index "{}"'.format(INDEX_NAME))

        request_body = {
"settings":{
    "index":{
        "number_of_shards":1,
        "number_of_replicas":0
    }
},
"mappings":{
    "motorcycle":{
        "properties":{
            "location": {
                "type":"text",
                "analyzer":"swedish"
            },
            "description":{
                "type":"text",
                "analyzer":"swedish"
            }
        }
    }
}
        }
        self.es.indices.create(index = INDEX_NAME, body = request_body)
        f_in = open(PATH_TO_DATASET, "r")
        actions = (json.loads(line) for line in f_in)
        print("Performed bulk index: {}".format(bulk(self.es, actions)))
        self.es.indices.refresh(index = "simple")

现在,我尝试使用邮递员查询所有具有location:Uppsala文档的索引(第一个对象的位置(我用python进行了相同的查询,结果相同):

POST to localhost:9200/simple/_search:
{
    "query": {
        "bool": {
            "filter": [

                {
                    "term": {
                        "location": "uppsala"
                    }
                }
            ]
        }
    }
}

它什么也不返回。 如果将位置更改为uddevalla ,也会发生同样的情况,该位置也位于原始数据(第二个文档)中。

但是,如果我将location更改为norrköping ,它将返回它应该执行的第三份文档。

这种不稳定行为的背后原因是什么?

更新:尝试使用稍大的数据文件:

{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:16", "price": 59900, "sellerName": "Lelles MC AB", "description": "KTM 690 Duke (Abs) M\u00e4tarst\u00e4llning: 450 mil F\u00e4rg: Vit Typ: Touring/Landsv\u00e4g Info: Mycket fin Duke 690 med rensad bakdel och handskydd.", "location": "Uppsala", "id": 345, "title": "KTM 690 Duke (Abs)", "modelYear": 2016, "url": "https://www.blocket.se/uppsala/KTM_690_Duke__Abs__79079911.htm?ca=11&w=3", "vehicleType": "Touring"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:00", "price": 12900, "sellerName": "Hondo", "description": "Hej! D\u00e5 va det dags att s\u00e4lja p\u00e4rlan. Det som \u00e4r gjort med crossen \u00e4r Nytt bakd\u00e4ck. Nya bromsbel\u00e4gg bak. Nytt sadel\u00f6verdrag. Kolvbytet gjord f\u00f6r 25 timmar sen. Inga l\u00e4ckage. Extra k\u00e5pset ing\u00e5r. Vid en smidig aff\u00e4r s\u00e5 ing\u00e5r en haspl\u00e5t. Crossen startar alltid p\u00e5 f\u00f6rsta eller andra kicken. Vid mer info f\u00e5r ni g\u00e4rna ringa p\u00e5 telefon mvh", "location": "Uddevalla", "id": 319, "title": "Honda Cr 125", "modelYear": 2001, "url": "https://www.blocket.se/goteborg/Honda_Cr_125_79080992.htm?ca=11&w=3", "vehicleType": "Cross/enduro"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:15", "price": 22000, "sellerName": "Martin", "description": "G\u00e5tt - 2284mil.Haft sedan 2008 \u00e4r servad regelbunden p\u00e5 mc-firma. V\u00e4lsk\u00f6tt. Startar och g\u00e5r fint. Allt original. Vinterf\u00f6rvaring i garage. Besiktad senast maj -17. Ring eller maila", "location": "Norrk\u00f6ping", "id": 314, "title": "Honda VT 600C", "modelYear": 1999, "url": "https://www.blocket.se/ostergotland/Honda_VT_600C_79081306.htm?ca=11&w=3", "vehicleType": "Custom"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:57", "price": 11000, "sellerName": "Tommy Antfolk", "description": "Ett bra tillf\u00e4lle att skaffa hoj med h\u00e4rlig kombination av Touring och sport till ett \u00f6verkomligt pris. Suzuki gsx 750 F 1996/1997 Fint skick f\u00f6r sin \u00e5lder Bes till 30/9 2018 Ring f\u00f6r mer info", "location": "V\u00e4rmd\u00f6", "id": 322, "title": "Suzuki Gsx 750 F Sport Touring", "modelYear": 1996, "url": "https://www.blocket.se/stockholm/Suzuki_Gsx_750_F_Sport_Touring_79080891.htm?ca=11&w=3", "vehicleType": "Touring"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:51", "price": 15500, "sellerName": "Ulla Ottosson", "description": "Ett underbart fordon jag \u00e4gt sedan \u00e5r 2000, anv\u00e4nd mest till och fr\u00e5n jobbet. L\u00e4ttk\u00f6rd, l\u00e4ttstartad och nyservad. Har g\u00e5tt endast 1660 mil. Vid n\u00e4rmare 70 \u00e5rs \u00e5lder \u00e4r det dags att ta farv\u00e4l av ett s\u00e5dant fordon!", "location": "Karlstad", "id": 327, "title": "Suzuki Burgman 400 AN", "modelYear": 1999, "url": "https://www.blocket.se/varmland/Suzuki_Burgman_400_AN_79080774.htm?ca=11&w=3", "vehicleType": "Scooter"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:33", "price": 85000, "sellerName": "mikael", "description": "Super fin H-D svart matt lackerad i perfekt skick \u00e5rsmodell 1996. Ny servad och alla slitdelar bytta. Pedant sk\u00f6tt 3400 mil Fler bilder skickas p\u00e5 beg\u00e4ran", "location": "Helsingborg", "id": 334, "title": "Harley-Davidson FXDWG", "modelYear": 1996, "url": "https://www.blocket.se/helsingborg/Harley_Davidson_FXDWG_79080441.htm?ca=11&w=3", "vehicleType": "Touring"}}

查询värmdö返回正确的文档。 查询karlstad不会返回任何内容(应该返回1次匹配)。 查询“ Helsingborg”将返回正确的文档。

更新2:

在任何查询中似乎都不会出现的未显示文档。 例如,此查询:

{
    "query": {
        "bool": {
            "filter": [],
            "must": {
                "multi_match": {
                    "fields": [
                        "title^1.0",
                        "description"
                    ],
                    "operator": "or",
                    "query": "suzuki",
                    "type": "cross_fields"
                }
            }
        }
    }
}

仅返回一个结果( location:Värmdö ),而实际上应返回两个结果( location:Karlstad的结果未返回)。

将索引创建代码更改为此可以解决问题。

    def create_index(self, file_path):
        """
            Takes path to file containing JSON-formatted data
            and indexes into Elasticsearch index.
        """
        print('Creating index "{}"'.format(INDEX_NAME))

        request_body = {
"settings":{
    "index":{
        "number_of_shards":1,
        "number_of_replicas":0
    }
},
"mappings":{
    "motorcycle":{
        "properties":{
            "location": {
                "type":"keyword",
            },
            "vehicleType": {
                "type": "keyword",
            },
            "description":{
                "type":"text",
                "analyzer":"swedish",
            },
        }
    }
}
        }
        self.es.indices.create(index = INDEX_NAME, body = request_body)
        f_in = open(PATH_TO_DATASET, "r")
        actions = (json.loads(line) for line in f_in)
        print("Performed bulk index: {}".format(bulk(self.es, actions)))
        self.es.indices.refresh(index = "simple")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM