How to define Spark RDD transformation with non-Lambda Function

Question

I recently started using with Spark and Java. I am currently experimenting with RDD transformations and actions. For the moment I am reading data out of a csv that contains some DateTime fields and then I apply a filter to keep only those rows that are younger than 2 days and finally I check if the resulting RDD is empty. I wrote a simple snippet that does what I want on a minimal level.

Function<List<String>, Boolean> filterPredicate = row -> new DateTime(row.get(1).isAfter(dtThreshold);

sc.textFile(inputFilePath)
            .map(text -> Arrays.asList(text.split(",")))
            .filter(filterPredicate)
            .isEmpty();

On this simple case I have assumed that the DateTime objects always live on the first column. I now want to expand that to use multiple column indexes. But to do that I need to be able to define a predicate function with more than one lines. That is the reason why I have separated the predicate function definition from the transformation code.

How I am supposed to define such a function?

Answer 1

Use the curly brace notation...

   Function<List<String>, Boolean> filterPredicate = row -> {
        boolean isDateAfter = new DateTime(row.get(1)).isAfter(dtThreshold);
        boolean hasName = row.get(2) != "";
        return isDateAfter && hasName;
    }

How to define Spark RDD transformation with non-Lambda Function

Question

1 answers

solution1
2 ACCPTED 2017-03-03 14:05:05

How to define Spark RDD transformation with non-Lambda Function

Question

1 answers

solution1 2 ACCPTED 2017-03-03 14:05:05

solution1
2 ACCPTED 2017-03-03 14:05:05