[英]MapReduce Job not producing output due to globStatus
I'm not sure why my Mapper and Reducer have no output. 我不确定为什么我的Mapper和Reducer没有输出。 The logic behind my code is, given a file of UUIDs (new line separated), I want to use
globStatus
to display all the paths to all potential files that the UUID might be in. Open and read the file. 我的代码背后的逻辑是,给定一个UUID文件(用新行分隔),我想使用
globStatus
显示UUID可能位于的所有潜在文件的所有路径。打开并读取该文件。 Each file contains 1-n lines of JSON. 每个文件包含1-n行JSON。 The UUID is in
event_header.event_id
in the JSON. UUID在JSON中的
event_header.event_id
中。
Right now the MapReduce job runs without errors. 现在,MapReduce作业运行没有错误。 However, something is wrong because I dont have any output.
但是,出了点问题,因为我没有任何输出。 I'm not sure how to debug MapReduce jobs as well.
我不确定如何调试MapReduce作业。 If someone could provide me a source that would be awesome!
如果有人可以提供给我一个很棒的资源! The expected output from this program should be
该程序的预期输出应为
fee90c3f-e832-4267-aa9b-250f53kc06d3 1
914938ae-eed6-4dfa-81bf-71e67m42d93a 1
bbge6012-9c51-4ae1-9242-a4aaf08bfb36 1
e5a12493-gtrf-4ar4-9235-02fd3h580970 1
3b054300-09ba-4d59-a6ac-a0975ca74ed5 1
6fbb1c5g-15ce-4e6f-9236-55a9d9d6e2c6 1
ab4677a3-0f58-428c-8h58-5fe3dfe528dc 1
caaa011d-ahba-4ne7-9h05-3872f3k1854c 1
example JSON: JSON示例:
{"event_header":{"version":"1.0","event_id":"fdk32k23-f7f6-412d-879d-f79b4c3b0d55","server_timestamp":1427734304673,"client_ip_address":"10.144.28.48","server_ip_address":"10.129.67.0"},"data_version":"1.0","application":{"properties":{}},"session":{"test":false,"user_id":"1121057496"},"event":{"timestamp":"1427734304577","event_category":"User","traffic":{"priority_code":"1728300000"},"event_id":"9ad26251-b940-408a-b6a9-0a825be1fd38","event_name":"Create"}}
In my logic, the output file should be the UUIDs with a 1 next to them because upon found, 1 is written, if not found 0 is written. 在我的逻辑中,输出文件应该是UUID,在它们旁边是1,因为一旦找到,就写入1,如果找不到,则写入0。 They should be all 1's because I pulled the UUIDs from the source.
它们应该全为1,因为我从源中提取了UUID。
I added the line context.write(new Text("None"), new Text("blank"))
in the for loop and I have found out that nothing is being written to output. 我在for循环中添加了
context.write(new Text("None"), new Text("blank"))
,发现没有任何内容写入输出中。 So I think I can safely conclude that I am using globStatus()
link incorrectly. 因此,我认为可以肯定地说,我使用了
globStatus()
链接不正确。
My Reducer currently does not do anything except I just wanted to see if I could get some simple logic working. 我的Reducer当前不执行任何操作,只是我只想看看是否可以运行一些简单的逻辑。 There are most likely bugs in my code as I dont know have a easy way to debug MapReduce jobs.
我的代码中很可能存在错误,因为我不知道有一种调试MapReduce作业的简便方法。
Driver: 司机:
public class SearchUUID {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "UUID Search");
job.getConfiguration().set("mapred.job.queue.name", "exp_dsa");
job.setJarByClass(SearchUUID.class);
job.setMapperClass(UUIDMapper.class);
job.setReducerClass(UUIDReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
UUIDMapper: UUIDMapper:
public class UUIDMapper extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
try {
Text one = new Text("1");
Text zero = new Text("0");
FileSystem fs = FileSystem.get(new Configuration());
FileStatus[] paths = fs.globStatus(new Path("/data/path/to/file/d_20150330-1650"));
for (FileStatus path : paths) {
BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(path.getPath())));
String json_string = br.readLine();
while (json_string != null) {
JsonElement jelement = new JsonParser().parse(json_string);
JsonObject jsonObject = jelement.getAsJsonObject();
jsonObject = jsonObject.getAsJsonObject("event_header");
jsonObject = jsonObject.getAsJsonObject("event_id");
if (value.toString().equals(jsonObject.getAsString())) {
System.out.println(value.toString() + "slkdjfksajflkjsfdkljsadfk;ljasklfjklasjfklsadl;sjdf");
context.write(value, one);
} else {
context.write(value, zero);
}
json_string = br.readLine();
}
}
} catch (IOException failed) {
}
}
}
Reducer: 减速器:
public class UUIDReducer extends Reducer<Text, Text, Text, Text>{
public void reduce(Text key, Text value, Context context) throws IOException, InterruptedException{
context.write(key, value);
}
}
did you check userlogs inside log folder? 您是否在日志文件夹中检查过用户日志? following code works fine
以下代码可以正常工作
jsonObject = jsonObject.getAsJsonObject("event_header"); jsonObject = jsonObject.getAsJsonObject(“ event_header”); jsonObject = jsonObject.getAsJsonObject("event_id");
jsonObject = jsonObject.getAsJsonObject(“ event_id”); This line is not correct use jsonObject.get("event_header").getAsJsonObject();
这行是不正确的使用jsonObject.get(“ event_header”)。getAsJsonObject(); jsonObject.get("event_id").getAsJsonObject();
jsonObject.get( “事项标识”)getAsJsonObject(); proble is in geting event_header,event_id JSONOBJECT.
问题在于获取event_header,event_id JSONOBJECT。
public class UUIDMapper extends Mapper < Object, Text, Text, Text > {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
try {
Text one = new Text("1");
Text zero = new Text("0");
String json_string[] = {
"your data", "your data", "your data", "your data "
};
int i = 0;
while (i < json_string.length) {
if (value.toString().equals(json_string[i])) {
context.write(value, one);
} else {
context.write(value, zero);
}
}
} catch (Exception t) {
t.printStackTrace();
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.