简体   繁体   English

如何在ArangoDb中的以下json文档上进行全文索引和搜索?

[英]How to do Full Text indexing and search on the below json Document in ArangoDb?

{
"batters":
    {
    "batter":[
            { "id": "1001", "type": "Regular" },
            { "id": "1002", "type": "Chocolate" },
            { "id": "1003", "type": "Blueberry" },
            { "id": "1004", "type": "Devil's Food" }
    ]
    },
    "topping":[
            { "id": "5001", "type": "None" },
            { "id": "5002", "type": "Glazed" },
            { "id": "5005", "type": "Sugar" },
            { "id": "5007", "type": "Powdered Sugar" },
            { "id": "5006", "type": "Chocolate with Sprinkles" },
            { "id": "5003", "type": "Chocolate" },
            { "id": "5004", "type": "Maple" }
     ]
}

Basically to have full-text search here I would need to do indexing on "batters.batter" and also on "batters.topping" ie on two attributes. 基本上,要在此处进行全文搜索,我需要对“ batters.batter”以及“ batters.topping”即两个属性进行索引。 How to handle this kind of full text searching. 如何处理这种全文搜索。 Please explain on the method and I would to implement my search through REST API. 请说明该方法,我将通过REST API实现搜索。 Thanking You in advance. 预先感谢您。

The best way to solve this is to change the data layout a little, since fulltext indices can only work on one attribute, and requesting the index twice won't be fast by any means. 解决此问题的最佳方法是稍微更改数据布局,因为全文索引只能对一个属性起作用,并且两次请求索引无论如何都不会很快。 Therefore we use an anonymous graph to connect the strings to their object. 因此,我们使用匿名图将字符串连接到它们的对象。

So, we create two (vertex)collections, one edge collection, one vertex collection with the fultext index: 因此,我们创建了两个(顶点)集合,一个边缘集合,一个带有富文本索引的顶点集合:

db._create("dishStrings")
db._createEdgeCollection("dishEdges")
db._create("dish")

db.dishStrings.ensureIndex({type: "fulltext", fields: [ "name" ]});

And save the documents to them with the relations tying them together. 并通过将它们联系在一起的关系保存文档给他们。 We use the _key attribute which is used to reference vertices in the _from and _to edge relations: 我们使用_key属性,该属性用于在_from_to边缘关系中引用顶点:

db.dishStrings.save({"_key": "1001", "name": "Regular" , type: "Batter"});
db.dishStrings.save({"_key": "1002", "name": "Chocolate", type: "Batter" });
db.dishStrings.save({"_key": "1003", "name": "Blueberry", type: "Batter"});
db.dishStrings.save({"_key": "1004", "name": "Devil's Food", type: "Batter"});
db.dishStrings.save({"_key": "5001", "name": "None", type: "Topping"});
db.dishStrings.save({"_key": "5002", "name": "Glazed", type: "Topping"});
db.dishStrings.save({"_key": "5005", "name": "Sugar", type: "Topping"});
db.dishStrings.save({"_key": "5007", "name": "Powdered Sugar", type: "Topping"});
db.dishStrings.save({"_key": "5006", "name": "Chocolate with Sprinkles", type: "Topping"});
db.dishStrings.save({"_key": "5003", "name": "Chocolate", type: "Topping"});
db.dishStrings.save({"_key": "5004", "name": "Maple", type: "Topping"});

db.dishEdges.save("dishStrings/1001", "dish/batter", {tasty: true, type: "Batter"})
db.dishEdges.save("dishStrings/1002", "dish/batter", {tasty: true, type: "Batter"})
db.dishEdges.save("dishStrings/1003", "dish/batter", {tasty: true, type: "Batter"})
db.dishEdges.save("dishStrings/1004", "dish/batter", {tasty: true, type: "Batter"})
db.dishEdges.save("dishStrings/5001", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5002", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5003", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5004", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5005", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5006", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5007", "dish/batter", {tasty: true, type: "Topping"})

db.dish.save({_key: "batter", tasty: true})

We revalidate that the fulltext index will work: 我们重新验证全文索引将起作用:

db._query("FOR oneDishStr IN FULLTEXT(dishStrings, 'name', 'Chocolate')" +
          " RETURN oneDishStr").toArray()

( .toArray() will print us the result on the console) We get 3 hits, one batter, two toppings. .toArray()将在控制台上显示结果)我们得到3个命中,一个击球手,两个浇头。 Since search strings may contain unvalidated strings, we rather use bind variables to circumvent injections : 由于搜索字符串可能包含未经验证的字符串,因此我们宁愿使用绑定变量来规避注入

db._query("FOR oneDishStr IN FULLTEXT(dishStrings, 'name', @searchString) " + 
          " RETURN oneDishStr", 
          {searchString: "Chocolate"});

Now lets use the edge relation to find the connected dish: 现在让我们使用边关系来找到连接的盘子:

db._query("FOR oneDishStr IN FULLTEXT(dishStrings, 'name', @searchString) "+ 
          "RETURN {str: oneDishStr, " + 
                  "dishes: NEIGHBORS(dishStrings, dishEdges, oneDishStr," + 
                                     " 'outbound')}",
           {searchString: "Chocolate"})

This was the old (up to 2.7) way to use graphs, since we want to use fast filters, lets translate this to the new 2.8 syntax : 这是使用图表的旧方法(最高2.7),因为我们要使用快速过滤器,因此可以将其转换为新的2.8语法

db._query("FOR oneDishStr IN FULLTEXT(dishStrings, 'name', @searchString) " + 
          "  FOR v IN 1..1 OUTBOUND oneDishStr dishEdges RETURN " + 
          "    {str: oneDishStr, dish: v}",
         {searchString: "Chocolate"})

We can see in both cases that we get one traversal for each of the 3 fulltext search hits for Chocolate . 我们可以看到,在这两种情况下, Chocolate的3个全文本搜索命中的每一个都得到一个遍历。 Now we are just interested in hits that are Toppings , so we will filter all those edges that aren't of type Topping : 现在我们只对Toppings匹配感兴趣,因此我们将过滤掉所有非Topping类型的边:

db._query("FOR oneDishStr IN FULLTEXT(dishStrings, 'name', @searchString) "+
          "   FOR v, e IN 1..1 OUTBOUND oneDishStr dishEdges " + 
          "      FILTER e.type == 'Topping' " +
          "         RETURN {str: oneDishStr, dish: v}", 
          {searchString: "Chocolate"})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM