简体   繁体   中英

JavaRDD<String> to JavaRDD<Row>

I am reading a txt file as a JavaRDD with the following command:

JavaRDD<String> vertexRDD = ctx.textFile(pathVertex);

Now, I would like to convert this to a JavaRDD because in that txt file I have two columns of Integers and want to add some schema to the rows after splitting the columns.

I tried also this:

JavaRDD<Row> rows = vertexRDD.map(line -> line.split("\t"))

But is says I cannot assign the map function to an "Object" RDD

  1. How can I create a JavaRDD out of a JavaRDD
  2. How can I use map to the JavaRDD?

Thanks!

Creating a JavaRDD out of another is implicit when you apply a transformation such as map . Here, the RDD you create is a RDD of arrays of strings (result of split ).

To get a RDD of rows, just create a Row from the array:

JavaRDD<String> vertexRDD = ctx.textFile("");
JavaRDD<String[]> rddOfArrays = vertexRDD.map(line -> line.split("\t"));
JavaRDD<Row> rddOfRows =rddOfArrays.map(fields -> RowFactory.create(fields));

Note that if your goal is then to transform the JavaRDD<Row> to a dataframe ( Dataset<Row> ), there is a simpler way. You can change the delimiter option when using spark.read to avoid having to use RDDs:

Dataset<Row> dataframe = spark.read()
    .option("delimiter", "\t")
    .csv("your_path/file.csv");  

You can define this two columns as a class's field, and then you can use

JavaRDD<Row> rows = rdd.map(new Function<ClassName, Row>() {
            @Override
            public Row call(ClassName target) throws Exception {
                return RowFactory.create(
                        target.getField1(),
                        target.getUsername(),
            }
        });

And then create StructField, finally using

StructType struct = DataTypes.createStructType(fields);
Dataset<Row> dataFrame = sparkSession.createDataFrame(rows, struct);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM