简体   繁体   English

spark类型不匹配:无法从JavaRDD <Object>转换为JavaRDD <String>

[英]spark Type mismatch: cannot convert from JavaRDD<Object> to JavaRDD<String>

I have started to write my Pyspark application to Java implementation. 我已经开始将我的Pyspark应用程序编写为Java实现。 I am using Java 8. I just started to execute some of the basic spark progrma in java. 我正在使用Java 8.我刚开始在java中执行一些基本的spark progrma。 I used the following wordcount example. 我使用了以下wordcount示例。

SparkConf conf = new SparkConf().setMaster("local").setAppName("Work Count App");

// Create a Java version of the Spark Context from the configuration
JavaSparkContext sc = new JavaSparkContext(conf);

JavaRDD<String> lines = sc.textFile(filename);

JavaPairRDD<String, Integer> counts = lines.flatMap(line -> Arrays.asList(line.split(" ")))
                    .mapToPair(word -> new Tuple2(word, 1))
                    .reduceByKey((x, y) -> (Integer) x + (Integer) y)
                    .sortByKey();

I am getting Type mismatch: cannot convert from JavaRDD<Object> to JavaRDD<String> error in lines.flatMap(line -> Arrays.asList(line.split(" "))) When i googled, in all the Java 8 based spark example, i saw the same above implementation.What went wrong in my environemnt or the program. 我得到Type mismatch: cannot convert from JavaRDD<Object> to JavaRDD<String>错误lines.flatMap(line -> Arrays.asList(line.split(" ")))当我用Google搜索时,在所有基于Java 8的内容中火花的例子,我看到了同样的上述实现。我的环境或程序出了什么问题。

Can some one help me ? 有人能帮我吗 ?

Use this code. 使用此代码。 Actual issue is rdd.flatMap function expects Iterator<String> while your code is creating List<String> . 实际问题是rdd.flatMap函数需要Iterator<String>而您的代码正在创建List<String> Calling the iterator() will fix the problem. 调用iterator()将解决问题。

JavaPairRDD<String, Integer> counts = lines.flatMap(line -> Arrays.asList(line.split(" ")).iterator())
            .mapToPair(word -> new Tuple2<String, Integer>(word, 1))
            .reduceByKey((x, y) ->  x +  y)
            .sortByKey();

counts.foreach(data -> {
        System.out.println(data._1()+"-"+data._2());
    });

try this code 试试这段代码

JavaRDD<String> words =
    lines.flatMap(line -> Arrays.asList(line.split(" ")));
JavaPairRDD<String, Integer> counts =
    words.mapToPair(w -> new Tuple2<String, Integer>(w, 1))
         .reduceByKey((x, y) -> x + y);
JavaRDD<String> obj = jsc.textFile("<Text File Path>");
JavaRDD<String> obj1 = obj.flatMap(l->{
ArrayList<String> al = new ArrayList();
String[] str = l.split(" ");
for(int i=0;i<str/length;i++) {
    al.add(str[i]);
}
return al.iterator();
});

Try this : 试试这个 :

 JavaRDD<String> words = input.flatMap(
                new FlatMapFunction<String, String>() {
                    public Iterator<String> call(String s) {
                        return (Arrays.asList(s.split(" ")).iterator());
                    }
                } );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM