简体繁体 English

由于评估是懒惰的，如何登录Spark Java API？

[英]How to log in Spark Java API as evaluation is Lazy?

原文 2016-05-09 04:36:25 1 1 java/ logging/ apache-spark

As the evaluation in Spark is lazy, the logs tracing the time to execute a statement could go wrong. 由于Spark中的评估很懒，因此跟踪执行语句时间的日志可能会出错。 For ex: if user captures start time before reading a text file and after file has been read(before and after sc.textFile()), it would lead to wrong information as the logging will be done but the file has not yet read due to lazy evaluation. 例如：如果用户在读取文本文件之前和读取文件之后（在sc.textFile（）之前和之后）捕获了开始时间，则将导致错误信息，因为将完成日志记录，但尚未读取文件懒惰的评价。 Any solution to this, like lazy logging? 有什么解决办法，例如惰性日志记录？

1 个解决方案

You have to call a spark action on your RDD in order to trigger the effective computation. 您必须在RDD上调用spark操作才能触发有效的计算。 Transformation actions like: map, filter, read of a file etc will not trigger a spark action. 诸如地图，过滤器，文件读取等转换操作不会触发火花操作。 In order to measure effective time between start-end logging put a spark action on the rdd (maybe you want to cache them for future use and call rdd.cache() ) 为了测量从开始到结束的两次记录之间的有效时间，请在rdd上执行spark操作（也许您希望将其缓存以备将来使用，然后调用rdd.cache() ）