I am porting over a Python app to Java and are stuck on the correct way to implement a Lambda flatMap()
. I am parsing through IP logs and need to split on ' ' (space).
My environment:
What I have is:
Load the dataframe:
Dataset<Row> MyLog = spark.sql("Select RecordNumber, IpAddress from Table);
MyLog.createOrReplaceTempView("MyLog");
now attempt lambda flatmap()
Dataset<String> Mylog2 = Mylog.flatMap(e -> String.asList(e.split(' ')));
I have tried several variances of this to include:
Dataset<Row> Mylog2 = Mylog.flatMap(e -> Array.asList(e.split(' ')));
Dataset<String> Mylog2 = Mylog.flatMap(lambda(e -> String.asList(e.split(' '))));
etc.
The original python looked like this:
Mylog2 = Mylog.rdd.flatMap(lambda(x,y): ((x,v) for v in y.split(' ')))
I would appreciate any insight into the correct way to implement this in Java using Spark
Thank you
what about:
Dataset<String> Mylog2 = Mylog.flatMap(row -> java.util.Arrays.stream(row.getString(1).split(' ')).iterator(), Encoders.STRING());
but which column do you want to split? On IpAddress?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.