[英]How to read multiple image files as input from hdfs in map-reduce?
private static String[] testFiles = new String[] {"img01.JPG","img02.JPG","img03.JPG","img04.JPG","img06.JPG","img07.JPG","img05.JPG"};
// private static String testFilespath = "/home/student/Desktop/images";
private static String testFilespath ="hdfs://localhost:54310/user/root/images";
//private static String indexpath = "/home/student/Desktop/indexDemo";
private static String testExtensive="/home/student/Desktop/images";
public static class MapClass extends MapReduceBase
implements Mapper<Text, Text, Text, Text> {
private Text input_image = new Text();
private Text input_vector = new Text();
@Override
public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException {
System.out.println("CorrelogramIndex Method:");
String featureString;
int MAXIMUM_DISTANCE = 16;
AutoColorCorrelogram.Mode mode = AutoColorCorrelogram.Mode.FullNeighbourhood;
for (String identifier : testFiles) {
try (FileInputStream fis = new FileInputStream(testFilespath + "/" + identifier)) {
//Document doc = builder.createDocument(fis, identifier);
//FileInputStream imageStream = new FileInputStream(testFilespath + "/" + identifier);
BufferedImage bimg = ImageIO.read(fis);
AutoColorCorrelogram vd = new AutoColorCorrelogram(MAXIMUM_DISTANCE, mode);
vd.extract(bimg);
featureString = vd.getStringRepresentation();
double[] bytearray=vd.getDoubleHistogram();
System.out.println("image: "+ identifier + " " + featureString );
}
System.out.println(" ------------- ");
input_image.set(identifier);
input_vector.set(featureString);
output.collect(input_image, input_vector);
}
}
}
public static class Reduce extends MapReduceBase
implements Reducer<Text, Text, Text, Text> {
@Override
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
String out_vector="";
while (values.hasNext()) {
out_vector.concat(values.next().toString());
}
output.collect(key, new Text(out_vector));
}
}
static int printUsage() {
System.out.println("image_mapreduce [-m <maps>] [-r <reduces>] <input> <output>");
ToolRunner.printGenericCommandUsage(System.out);
return -1;
}
@Override
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), image_mapreduce.class);
conf.setJobName("image_mapreduce");
// the keys are words (strings)
conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
conf.setOutputValueClass(Text.class);
conf.setMapperClass(MapClass.class);
// conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
List<String> other_args = new ArrayList<String>();
for(int i=0; i < args.length; ++i) {
try {
if ("-m".equals(args[i])) {
conf.setNumMapTasks(Integer.parseInt(args[++i]));
} else if ("-r".equals(args[i])) {
conf.setNumReduceTasks(Integer.parseInt(args[++i]));
} else {
other_args.add(args[i]);
}
} catch (NumberFormatException except) {
System.out.println("ERROR: Integer expected instead of " + args[i]);
return printUsage();
} catch (ArrayIndexOutOfBoundsException except) {
System.out.println("ERROR: Required parameter missing from " +
args[i-1]);
return printUsage();
}
}
FileInputFormat.setInputPaths(conf, other_args.get(0));
//FileInputFormat.setInputPaths(conf,new Path("hdfs://localhost:54310/user/root/images"));
FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new image_mapreduce(), args);
System.exit(res);
}
}
`我正在编写一个程序,它将多个图像文件作为输入,存储在hdfs中并提取map函数中的特征。 如何在FileInputStream(一些参数)中指定读取图像的路径? 还是有什么方法可以读取多个图像文件?
我想做的是:-将hdfs中的多个图像文件作为输入-提取map函数中的特征。 -反复减少。 请以代码或更好的方式帮助我。
研究使用HIPI库 -它将图像集合存储到ImageBundle中(比将单独的图像文件存储在HDFS中效率更高)。 他们也有几个例子。
对于您的代码,您需要指定计划使用的输入和输出格式。 当前没有将整个文件移交的输入格式,但是您可以扩展FileInputFormat并创建一个发出<Text, BytesWritable>
对的RecordReader,其中键是文件名,值是图像文件的字节。
实际上, Hadoop-The Definitive Guide中有一个确切输入格式的示例:
如果要将所有图像作为输入发送给MR任务,则只需将conf.setFileInputPath()设置为输入的目录。如果要在特定文件夹中发送选择性图像,则可以在设置conf时添加多个路径。 setFileInputPath();
一种方法是为每个图像创建一个Path []。 或仅将其设置为所有路径用逗号分隔的字符串。 浏览以下文档
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html
还有另一件事,您必须将Map输入格式设置为Text,ByteArray从该ByteArray输入获取图像功能,而不是创建新的fileinputstream。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.