简体   繁体   中英

customizing the nutch indexwriter to map values into elastic docin multielvel(like JSON)

I am going to develop a plugin for apache nutch to customise the indexwriter, my problem is that in the plugin when you have access to NutchDocument , you just put data at the first level not at the second level. for instance for "a", "location" and "url" easily you can put data by doc.add("url", "www.csad.com"); while for "company" which is a complex object it is impossible to send an object of company calss.

this is my idnex_Metadata in elastic search

{
   "properties":{
      "a":{
         "type":"string"
      },
      "company":{
         "type":"object",
         "properties":{
            "id":{
               "type":"integer",
               "index":"not_analyzed"
            },
            "type":{
               "type":"string",
               "index":"not_analyzed"
            },
            "name":{
               "type":"string"
            },
            "location":{
               "type":"geo_point"
            },
            "slug":{
               "type":"string",
               "index":"not_analyzed"
            }
         }
      },
      "location":{
         "type":"geo_point",
         "lat_lon":"true"
      },
      "url":{
         "type":"string",
         "index":"not_analyzed"
      }
   }
}

I can't send data to "company" in java plugin, while without company it works well

      doc.add("location", rs.getString("ic_company_lat") + "," + rs.getString("ic_company_lng"));

      Company cmp = new Company();
      cmp.setId(Integer.parseInt(rs.getString("ic_company_id")));
      cmp.setType("type");
      cmp.setName(rs.getString("ic_company_name"));
      doc.add("company", cmp);

Assuming that you're using the elastic-indexer plugin, Nutch out of the box doesn't support using custom classes for indexing (you can add it to the NutchDocument instance, but you'll need to write your own logic to handle it in the indexers ES/Solr, ie modifying the plugins).

You can accomplish what you want using a simple HashMap :

Map map = new HashMap<String, String>();
map.put("name", "Company Name");
...

doc.add("company", map);

With this you'll get a document with the structure that you're after in ES:

"company": {
    "name": "Awesome company",
    ...
},

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM