简体   繁体   中英

Java Load sequence file from hdfs as JavaRDD<Vector>

I have the following method to write file to HDFS

public void writePointsToFile(Path path, FileSystem fs, Configuration conf,
        List<Vector> points) throws IOException {

    SequenceFile.Writer writer = SequenceFile.createWriter(conf,
            Writer.file(path), Writer.keyClass(LongWritable.class),
            Writer.valueClass(Vector.class));

    long recNum = 0;

    for (Vector point : points) {
        writer.append(new LongWritable(recNum++), point);
    }
    writer.close();
}

I need to know how to read this file as JavaRDD<Vector> to be used in Spark Clustering K-mean ?

The typical pattern with Spark is to transform immutable objects into new ones so transforming a DRM (Mahout distributed row matrix) or collections of Mahout Vectors is the way you should do this. So not sure what you are asking.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM