I am trying to get the application output of the spark run and cannot find a straightforward way doing that.
Basically I am talking about the content of the <spark install dir>/work
directory on the cluster worker.
I could've copied that directory to the location I need, but in case of 100500 nodes it simply doesn't scale.
The other option I was considering is to attach an exit function (like a TRAP in bash) to get the logs from each worker as a part of the app run. I just think there has to be a better solution than that.
Yeah, I know that I can use YARN or Mesos cluster manager to get the logs, however it seems really weird to me that in order to do such a convenient thing I cannot use the default cluster manager.
Thanks a lot.
In the end I went for the following solution (Python):
import os
import tarfile
from io import BytesIO
from pyspark.sql import SparkSession
# Get the spark app.
spark = SparkSession.builder.appName("my-spark-app").getOrCreate()
# Get the executor working directories.
spark_home = os.environ.get('SPARK_HOME')
if spark_home:
num_workers = 0
with open(os.path.join(spark_home, 'conf', 'slaves'), 'r') as f:
for line in f:
num_workers += 1
if num_workers:
executor_logs_path = '/where/to/store/executor_logs'
def _map(worker):
'''Returns the list of tuples of the name and the tar.gz of the worker log directory in binary format
for the corresponding worker.
'''
flo = BytesIO()
with tarfile.open(fileobj=flo, mode="w:gz") as tar:
tar.add(os.path.join(spark_home, 'work'), arcname='work')
return [('worker_%d_dir.tar.gz' % worker, flo.getvalue()),]
def _reduce(worker1, worker2):
'''Appends the worker name and its log tar.gz's into the list.
'''
worker1.extend(worker2)
return worker1
os.makedirs(executor_logs_path)
logs = spark.sparkContext.parallelize(range(num_workers), num_workers).map(_map).reduce(_reduce)
with tarfile.open(os.path.join(executor_logs_path, 'logs.tar'), 'w') as tar:
for name, data in logs:
info = tarfile.TarInfo(name=name)
info.size=len(data)
tar.addfile(tarinfo=info, fileobj=BytesIO(data))
A couple of concerns though:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.