简体   繁体   中英

Find the yarn ApplicationID of of the current Spark job from the DRIVER node?

Is there a straightforward way to get the yarn ApplicationId of the current job from the DRIVER node running under Amazon's Elastic Map Reduce (EMR)? This is running Spark in the cluster mode.

Right now I'm using code that runs a map() operation on a worker to read the CONTAINER_ID environment variable. This seems inefficient. Here's the code:

def applicationIdFromEnvironment():
    return "_".join(['application'] + os.environ['CONTAINER_ID'].split("_")[1:3])

def applicationId():
    """Return the Yarn (or local) applicationID.
    The environment variables are only set if we are running in a Yarn container.
    """

    # First check to see if we are running on the worker...
    try:
        return applicationIdFromEnvironment()
    except KeyError:
        pass

    # Perhaps we are running on the driver? If so, run a Spark job that finds it.
    try:
        from pyspark import SparkConf, SparkContext
        sc = SparkContext.getOrCreate()
        if "local" in sc.getConf().get("spark.master"):
            return f"local{os.getpid()}"
        # Note: make sure that the following map does not require access to any existing module.
        appid = sc.parallelize([1]).map(lambda x: "_".join(['application'] + os.environ['CONTAINER_ID'].split("_")[1:3])).collect()
        return appid[0]
    except ImportError:
        pass

    # Application ID cannot be determined.
    return f"unknown{os.getpid()}"

You can get the applicationID directly from the SparkContext using the property applicationId :

A unique identifier for the Spark application. Its format depends on the scheduler implementation.

  • in case of local spark app something like 'local-1433865536131'

  • case of YARN something like 'application_1433865536131_34483'

appid = sc.applicationId

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM