简体   繁体   中英

How to log a lazy evaluated dataframe in Apache Spark?

How to do logging in Spark applications without triggering an action in logger statements?

I'd like to be able to do something like:

df = df
  .logInfo("value is " + col("xyz));

Is it possible in Java?

When I read your pseudo code, I read that you are going to log some elements of the column (5? 10?)… how do you expect the element to render in the log file? Visual ASCII array like show() ?

What I have done in similar cases is call the first record and simply show it on the console, as in:

df.show(1);

It won't evaluate the whole data in the DAG. If you need to access the values themselves use:

Row r = df.first();
log.debug(r.mkString());

Look at: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Row.html#mkString-java.lang.String-

I assumed df was an instance of Dataset<Row> .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM