I've got a weird problem: When I'm using count() on a DataSet prior to other processing (BulkIteration) apache flink will only execute that plan for count() and skip my other operations. I couldn't find anything in the logs about that.
Further more, this doesn't happen in my IDE. There all the operations work. Only when I upload it via WebUI, this kind of problem occurs.
So: Is that a general problem? How can I solve that without having to compute the value count myself?
Thanks!
UPDATE:
The code does something similar like this (well, I know, that this example isn't well-designed for productive code, but it shows my problem).
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.aggregation.Aggregations;
import org.apache.flink.api.java.tuple.Tuple1;
import java.util.LinkedList;
import java.util.List;
import java.util.Random;
public class CountProblemExample {
public static void main(String[] args) throws Exception {
Random rnd = new Random();
int randomNumber = 100000 + rnd.nextInt(100000);
List<Double> doubles = new LinkedList<>();
for (int i = 0; i < randomNumber; i++) {
doubles.add(rnd.nextDouble());
}
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<Double> doubleDataSet = env.fromCollection(doubles);
final int count = (int)doubleDataSet.count(); // In the UI there the code stops further execution
DataSet<Double> avgSet = doubleDataSet
.map(new MapFunction<Double, Tuple1<Double>>() {
@Override
public Tuple1<Double> map(Double value) throws Exception {
return new Tuple1<>(value);
}
})
.aggregate(Aggregations.SUM, 0)
.map(new MapFunction<Tuple1<Double>, Double>() {
@Override
public Double map(Tuple1<Double> t) throws Exception {
double avg = 0;
if (count > 0) {
avg = t.f0 / count;
}
return avg;
}
});
double avg = avgSet
.collect()
.get(0);
System.out.println(avg);
}
}
You probably forgot to call ExecutionEnvironment.execute()
. A DataSet job is not executed before you call that method.
DataSet.count()
and DataSet.collect()
internal trigger an execution as well.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.