简体   繁体   中英

print statement is not recorded in log file in spark-submit in cluster mode

I have the following pyspark code named sample.py with print statement

import sys
from pyspark.sql import SparkSession
from pyspark.sql.types import *
import pyspark.sql.functions as f
from datetime import datetime
from time import time

if __name__ == '__main__':
    spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
    print("Print statement-1")
    schema = StructType([
        StructField("author", StringType(), False),
        StructField("title", StringType(), False),
        StructField("pages", IntegerType(), False),
        StructField("email", StringType(), False)
    ])

    data = [
        ["author1", "title1", 1, "author1@gmail.com"],
        ["author2", "title2", 2, "author2@gmail.com"],
        ["author3", "title3", 3, "author3@gmail.com"],
        ["author4", "title4", 4, "author4@gmail.com"]
    ]

    df = spark.createDataFrame(data, schema)
    print("Number of records",df.count())
    sys.exit(0)

the below spark-submit with sample.log is not printing the print statement

spark-submit --master yarn --deploy-mode cluster sample.py > sample.log

The scenario is we want to print something information in the log file so that after the spark job completes based on that the print statement in log file we will do some other actions.

Please help me on this

The print statements will not be found in the spark-submit logs but rather in the yarn logs. When you do spark-submit you will get an application ID which looks like this application_1234567890123_12345 .

Now run the following command with the application Id to get the aggregated yarn logs after the spark job has completed.

yarn logs -applicationId <applicationId>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM