如何使用非Lambda函数定义Spark RDD转换

Question

I recently started using with Spark and Java. 我最近开始使用Spark和Java。 I am currently experimenting with RDD transformations and actions. 我目前正在尝试RDD转换和操作。 For the moment I am reading data out of a csv that contains some DateTime fields and then I apply a filter to keep only those rows that are younger than 2 days and finally I check if the resulting RDD is empty. 目前，我正在从包含某些DateTime字段的csv中读取数据，然后应用过滤器以仅保留那些小于2天的行，最后检查生成的RDD是否为空。 I wrote a simple snippet that does what I want on a minimal level. 我写了一个简单的代码片段，可以在最低限度上实现我想要的功能。

Function<List<String>, Boolean> filterPredicate = row -> new DateTime(row.get(1).isAfter(dtThreshold);

sc.textFile(inputFilePath)
            .map(text -> Arrays.asList(text.split(",")))
            .filter(filterPredicate)
            .isEmpty();

On this simple case I have assumed that the DateTime objects always live on the first column. 在这种简单情况下，我假设DateTime对象始终位于第一列上。 I now want to expand that to use multiple column indexes. 我现在想扩展它以使用多个列索引。 But to do that I need to be able to define a predicate function with more than one lines. 但是要做到这一点，我需要能够定义多于一行的谓词函数。 That is the reason why I have separated the predicate function definition from the transformation code. 这就是为什么我将谓词函数定义与转换代码分开的原因。

How I am supposed to define such a function? 我应该如何定义这样的功能？

Answer 1

Use the curly brace notation... 使用花括号符号...

   Function<List<String>, Boolean> filterPredicate = row -> {
        boolean isDateAfter = new DateTime(row.get(1)).isAfter(dtThreshold);
        boolean hasName = row.get(2) != "";
        return isDateAfter && hasName;
    }

如何使用非Lambda函数定义Spark RDD转换

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-03-03 14:05:05

如何使用非Lambda函数定义Spark RDD转换

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-03-03 14:05:05

解决方案1
2 已采纳 2017-03-03 14:05:05