简体   繁体   中英

How to use scala.collection.immutable.Stream class using Java

I have an existing code in Scala and trying to write the same code in Java . But facing some issue.

Scala Code :

import java.io.{BufferedReader, InputStreamReader}
import java.util.zip.ZipInputStream
import org.apache.spark.SparkContext
import org.apache.spark.input.PortableDataStream
import org.apache.spark.rdd.RDD

def readFile(path: String,minPartitions: Int): RDD[String] = {

      if (path.endsWith(".zip")) {
        sc.binaryFiles(path, minPartitions)
          .flatMap {
              case (name: String, content: PortableDataStream) =>
            val zis = new ZipInputStream(content.open)
            val entry = zis.getNextEntry
            val br = new BufferedReader(new InputStreamReader(zis))
            Stream.continually(br.readLine()).takeWhile(_ != null)
          }
      }
    }

I have written below java code -

import org.apache.spark.input.PortableDataStream;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.rdd.RDD;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

        public RDD<String> readFile(String inputDir, int minPartitions) throws Exception {
    SparkSession sparkSession = null;
    sparkSession = SparkSession.builder().appName("zipPoc").config("spark.master", "yarn").getOrCreate();

    JavaSparkContext sc = new JavaSparkContext(sparkSession.sparkContext());
    if (inputDir.endsWith(".zip")) {
        sc.binaryFiles(inputDir, minPartitions).flatMap (
            (String name , PortableDataStream content) -> {
                ZipInputStream stream = new ZipInputStream(content.open());
                ZipEntry entry = stream.getNextEntry();
                BufferedReader br = new BufferedReader(new InputStreamReader(stream));
                scala.collection.immutable.Stream.continually(br.readLine()).takeWhile(_ != null);
            }
        );
    }

}

I am getting below error.

在此处输入图片说明

Anyone have a clue about this and help with the appropriate code .

continually expects lambda with no parameters and returning value. Java equivalent would be:

() -> br.readLine()

There is also no _ in Java, you would have to use explicit parameter.

(line) -> line != null

So this should work:

Stream.continually(() -> {       
    try {
        return br.readLine();
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}).takeWhile((line) -> line != null)

====

As you noticed readLine throws checked exception. Quickest fix is just to wrap call in try/catch .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM