简体   繁体   English

集群模式下spark-submit的log文件中不记录print语句

[英]print statement is not recorded in log file in spark-submit in cluster mode

I have the following pyspark code named sample.py with print statement我有以下 pyspark 代码名为 sample.py 并带有打印语句

import sys
from pyspark.sql import SparkSession
from pyspark.sql.types import *
import pyspark.sql.functions as f
from datetime import datetime
from time import time

if __name__ == '__main__':
    spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
    print("Print statement-1")
    schema = StructType([
        StructField("author", StringType(), False),
        StructField("title", StringType(), False),
        StructField("pages", IntegerType(), False),
        StructField("email", StringType(), False)
    ])

    data = [
        ["author1", "title1", 1, "author1@gmail.com"],
        ["author2", "title2", 2, "author2@gmail.com"],
        ["author3", "title3", 3, "author3@gmail.com"],
        ["author4", "title4", 4, "author4@gmail.com"]
    ]

    df = spark.createDataFrame(data, schema)
    print("Number of records",df.count())
    sys.exit(0)

the below spark-submit with sample.log is not printing the print statement以下带有 sample.log 的 spark-submit 未打印打印语句

spark-submit --master yarn --deploy-mode cluster sample.py > sample.log

The scenario is we want to print something information in the log file so that after the spark job completes based on that the print statement in log file we will do some other actions.场景是我们想在日志文件中打印一些信息,以便在 spark 作业完成后基于日志文件中的打印语句,我们将执行一些其他操作。

Please help me on this请帮助我

The print statements will not be found in the spark-submit logs but rather in the yarn logs.打印语句不会在 spark-submit 日志中找到,而是在 yarn 日志中。 When you do spark-submit you will get an application ID which looks like this application_1234567890123_12345 .当您执行 spark-submit 时,您将获得一个类似于application_1234567890123_12345的应用程序 ID。

Now run the following command with the application Id to get the aggregated yarn logs after the spark job has completed.现在使用应用程序 ID 运行以下命令,以在 spark 作业完成后获取聚合的纱线日志。

yarn logs -applicationId <applicationId>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM