Java Load sequence file from hdfs as JavaRDD<Vector>

Question

I have the following method to write file to HDFS

public void writePointsToFile(Path path, FileSystem fs, Configuration conf,
        List<Vector> points) throws IOException {

    SequenceFile.Writer writer = SequenceFile.createWriter(conf,
            Writer.file(path), Writer.keyClass(LongWritable.class),
            Writer.valueClass(Vector.class));

    long recNum = 0;

    for (Vector point : points) {
        writer.append(new LongWritable(recNum++), point);
    }
    writer.close();
}

I need to know how to read this file as JavaRDD<Vector> to be used in Spark Clustering K-mean ?

Answer 1

The typical pattern with Spark is to transform immutable objects into new ones so transforming a DRM (Mahout distributed row matrix) or collections of Mahout Vectors is the way you should do this. So not sure what you are asking.

Java Load sequence file from hdfs as JavaRDD<Vector>

Question

1 answers

solution1
0 2016-07-18 23:29:23

Java Load sequence file from hdfs as JavaRDD<Vector>

Question

1 answers

solution1 0 2016-07-18 23:29:23

solution1
0 2016-07-18 23:29:23