I have started to write my Pyspark application to Java implementation. I am using Java 8. I just started to execute some of the basic spark progrma in java. I used the following wordcount example.
SparkConf conf = new SparkConf().setMaster("local").setAppName("Work Count App");
// Create a Java version of the Spark Context from the configuration
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> lines = sc.textFile(filename);
JavaPairRDD<String, Integer> counts = lines.flatMap(line -> Arrays.asList(line.split(" ")))
.mapToPair(word -> new Tuple2(word, 1))
.reduceByKey((x, y) -> (Integer) x + (Integer) y)
.sortByKey();
I am getting Type mismatch: cannot convert from JavaRDD<Object> to JavaRDD<String>
error in lines.flatMap(line -> Arrays.asList(line.split(" ")))
When i googled, in all the Java 8 based spark example, i saw the same above implementation.What went wrong in my environemnt or the program.
Can some one help me ?
Use this code. Actual issue is rdd.flatMap function expects Iterator<String>
while your code is creating List<String>
. Calling the iterator() will fix the problem.
JavaPairRDD<String, Integer> counts = lines.flatMap(line -> Arrays.asList(line.split(" ")).iterator())
.mapToPair(word -> new Tuple2<String, Integer>(word, 1))
.reduceByKey((x, y) -> x + y)
.sortByKey();
counts.foreach(data -> {
System.out.println(data._1()+"-"+data._2());
});
try this code
JavaRDD<String> words =
lines.flatMap(line -> Arrays.asList(line.split(" ")));
JavaPairRDD<String, Integer> counts =
words.mapToPair(w -> new Tuple2<String, Integer>(w, 1))
.reduceByKey((x, y) -> x + y);
JavaRDD<String> obj = jsc.textFile("<Text File Path>");
JavaRDD<String> obj1 = obj.flatMap(l->{
ArrayList<String> al = new ArrayList();
String[] str = l.split(" ");
for(int i=0;i<str/length;i++) {
al.add(str[i]);
}
return al.iterator();
});
Try this :
JavaRDD<String> words = input.flatMap(
new FlatMapFunction<String, String>() {
public Iterator<String> call(String s) {
return (Arrays.asList(s.split(" ")).iterator());
}
} );
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.