简体   繁体   中英

How to define Spark RDD transformation with non-Lambda Function

I recently started using with Spark and Java. I am currently experimenting with RDD transformations and actions. For the moment I am reading data out of a csv that contains some DateTime fields and then I apply a filter to keep only those rows that are younger than 2 days and finally I check if the resulting RDD is empty. I wrote a simple snippet that does what I want on a minimal level.

Function<List<String>, Boolean> filterPredicate = row -> new DateTime(row.get(1).isAfter(dtThreshold);

sc.textFile(inputFilePath)
            .map(text -> Arrays.asList(text.split(",")))
            .filter(filterPredicate)
            .isEmpty();

On this simple case I have assumed that the DateTime objects always live on the first column. I now want to expand that to use multiple column indexes. But to do that I need to be able to define a predicate function with more than one lines. That is the reason why I have separated the predicate function definition from the transformation code.

How I am supposed to define such a function?

Use the curly brace notation...

   Function<List<String>, Boolean> filterPredicate = row -> {
        boolean isDateAfter = new DateTime(row.get(1)).isAfter(dtThreshold);
        boolean hasName = row.get(2) != "";
        return isDateAfter && hasName;
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM