We have some json data stored into HDFS and we are trying to use elasticsearch-hadoop map reduce to ingest data into Elasticsearch.
The code we used is very simple (below)
public class TestOneFileJob extends Configured implements Tool {
public static class Tokenizer extends MapReduceBase
implements Mapper<LongWritable, Text, LongWritable, Text> {
@Override
public void map(LongWritable arg0, Text value, OutputCollector<LongWritable, Text> output,
Reporter reporter) throws IOException {
output.collect(arg0, value);
}
}
@Override
public int run(String[] args) throws Exception {
JobConf job = new JobConf(getConf(), TestOneFileJob.class);
job.setJobName("demo.mapreduce");
job.setInputFormat(TextInputFormat.class);
job.setOutputFormat(EsOutputFormat.class);
job.setMapperClass(Tokenizer.class);
job.setSpeculativeExecution(false);
FileInputFormat.setInputPaths(job, new Path(args[1]));
job.set("es.resource.write", "{index_name}/live_tweets");
job.set("es.nodes", "els-test.css.org");
job.set("es.input.json", "yes");
job.setMapOutputValueClass(Text.class);
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new TestOneFileJob(), args));
}
}
This code worked fine but we have two issues with it.
The first issue is the value of es.resource.write
property. Currently it is provided by the property index_name
from the json.
If the json contains a property of type array like
{
"tags" : [{"tag" : "tag1"}, {"tag" : "tag2"}]
}
How can we configure the es.resource.write
to take the first tag
value for example?
we tried to use {tags.tag}
and {tags[0].tag}
but either did not work.
The other issue, how can I make the job index the json document in the two values of the tags property?
We solved the two problems by doing the following
1- In the run method we put the value of es.resource.write
as following
job.set("es.resource.write", "{tag}/live_tweets");
2- In the map function we convert the json into an object using gson library
Object currentValue = gson.fromJson(jsonString, Object.class);
POJO
of the json we have 3- From the Object we could extract the tag we want and add its value as a new property to the json.
The previous steps solved the first problem. Regarding the second problem (if we want the same json to be stored into multiple indexes based on the number of tags), we simply looped through the tags in the json and change the tag property we added then pass the json again to the collector. Below is the code required for this step.
@Override
public void map(LongWritable arg0, Text value, OutputCollector<LongWritable, Text> output, Reporter reporter)
throws IOException {
List<String> tags = getTags(value.toString());
for (String tag : tags) {
String newJson = value.toString().replaceFirst("\\{", "{\"tag\":\""+tag+"\",");
output.collect(arg0, new Text(newJson));
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.