简体   繁体   English

如何在Spark / Java项目的信息/调试级别中记录Spark Dataset PrintSchema

[英]How to do logging of spark Dataset printSchema in info/debug level in spark- java project

Trying to covert my spark scala project into spark-java project. 试图将我的spark scala项目转换为spark-java项目。 I have a logging in scala as below 我在scala中有如下记录

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

    class ClassName{
      val logger  = LoggerFactory.getLogger("ClassName")
      ...
      val dataframe1 = ....///read dataframe from text file.
      ...

      logger.debug("dataframe1.printSchema : \n " + dataframe1.printSchema; //this is working fine.
    }

Now I am trying to write it in java 1.8 as below 现在我正在尝试在Java 1.8中编写它,如下所示

public class ClassName{

    public static final Logger logger  = oggerFactory.getLogger("ClassName"); 
      ...
     Dataset<Row> dataframe1 = ....///read dataframe from text file.
     ...

     logger.debug("dataframe1.printSchema : \n " + dataframe1.printSchema()); //this is not working 

}

I tried several ways but nothing worked to log printSchema in debug/info mode. 我尝试了几种方法,但是在调试/信息模式下无法记录printSchema。

dataframe1.printSchema() // this actually returning void hence not able to append to string. dataframe1.printSchema()//实际上返回void,因此无法追加到字符串。

How actually logging is done spark-java production grade projects ? spark-java生产级项目实际上是如何进行日志记录的? What is the best approach I need to follow to log in debugging? 登录调试需要遵循的最佳方法是什么?

How to handle the above scenario? 如何处理以上情况? ie log.debug( dataframe1.printSchema() ) in java ? 即log.debug(dataframe1.printSchema())在Java中?

printSchema method already prints the schema to the console without returning it in any form. printSchema方法已经将模式打印到控制台,而没有以任何形式返回它。 You can simply call the method and redirect console output somewhere else. 您可以简单地调用该方法,并将控制台输出重定向到其他位置。 There are other workarounds like this one . 还有其他的解决办法像这一个

You can use df.schema.treeString . 您可以使用df.schema.treeString This returns a string when compared to Unit() equivalent of Void in java returned by df.printSchema . df.printSchema返回的java中Void Unit()等效项相比,它返回​​一个字符串。 This is true in Scala and I believe it is the same in Java.Let me know if that helps. 在Scala中是这样,我相信在Java中也是如此。让我知道是否有帮助。

scala> val df = Seq(1, 2, 3).toDF()
df: org.apache.spark.sql.DataFrame = [value: int]

scala> val x = df.schema.treeString
x: String =
"root
 |-- value: integer (nullable = false)
"

scala> val y = df.printSchema
root
 |-- value: integer (nullable = false)

y: Unit = ()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM