Apache Spark使用本机依赖关系-独立模式下的驱动程序/执行程序代码流

Question

I have setup spark in standalone mode (single node in my laptop), tried to integrate opencv to read a set of images from a directory and detect faces in each image. 我以独立模式（笔记本电脑中的单个节点）设置了spark，试图集成opencv以从目录中读取一组图像并检测每个图像中的面孔。 I am trying to understand how the native dependencies are shipped to executor jvm, I would have thought that in the given below program, System.loadLibrary function would get executed as part of driver jvm, and the executor jvm would fail when the anonymous function tries to find the native library. 我试图了解如何将本机依赖项传递给执行器jvm，我会认为在给定的以下程序中，System.loadLibrary函数将作为驱动器jvm的一部分被执行，而当匿名函数尝试执行时，执行器jvm将会失败查找本机库。 But contrary to my understanding, the program works fine. 但是与我的理解相反，该程序运行良好。 Can someone explain how this works and what part of code is shipped from driver to executor. 有人可以解释一下它是如何工作的，以及代码的哪一部分从驱动程序传送到执行程序。

 public static void main( String[] args )
 {
    SparkConf conf = new SparkConf().setMaster("spark://localhost:7077").setAppName("Image detect App");
    JavaSparkContext sc = new JavaSparkContext(conf);
    System.loadLibrary(Core.NATIVE_LIBRARY_NAME);

    CascadeClassifier faceDetector = new CascadeClassifier("/home/xx/Project/opencv-3.1.0/data/haarcascades_cuda/haarcascade_frontalface_alt.xml");

    File tempDir = new File("/home/xx/images/new");
    String tempDirName = tempDir.getAbsolutePath();

    JavaPairRDD<String, PortableDataStream> readRDD = sc.binaryFiles(tempDirName,3);
    List<Tuple2<String, PortableDataStream>> result = readRDD.collect();
    for (Tuple2<String, PortableDataStream> res : result) 
    {
        Mat image = Imgcodecs
                .imread(res._1().replace("file:",""));

        MatOfRect faceDetections = new MatOfRect();
        faceDetector.detectMultiScale(image, faceDetections);

        for (Rect rect : faceDetections.toArray()) {
            Imgproc.rectangle(image, new Point(rect.x, rect.y), new Point(rect.x + rect.width, rect.y + rect.height),
                    new Scalar(0, 255, 0));
        }
        String filename = res._1().replace("file:","") + "_out";
        Imgcodecs.imwrite(filename, image);
    }

} Have created a jar with the above program and ran the following spark submit command, it works fine as expected. }使用上述程序创建了一个jar，并运行了以下spark Submit命令，它可以按预期运行。

./bin/spark-submit --verbose --master spark://localhost:7077 --num-executors 2 --class com.xxx.MainSparkImage --jars /home/xx/Project/opencv-3.1.0/release/bin/opencv-310.jar --driver-library-path /home/xx/Project/opencv-3.1.0/release/lib /home/xx/ImageProcess.jar ./bin/spark-submit --verbose --master spark：// localhost：7077 --num-executors 2 --class com.xxx.MainSparkImage --jars /home/xx/Project/opencv-3.1.0/ release / bin / opencv-310.jar --driver-library-path /home/xx/Project/opencv-3.1.0/release/lib /home/xx/ImageProcess.jar

Thanks Srivatsan 谢谢Srivatsan

Answer 1

List<Tuple2<String, PortableDataStream>> result = readRDD.collect();

This line would cause the RDD to be collected back to the driver as a local collection. 此行将导致RDD作为本地集合被收集回驱动程序。 The rest of the code (for loop) executes locally within the driver. 其余代码（for循环）在驱动程序内本地执行。 Hence you don't see any errors related to missing native libraries on the executors. 因此，您不会在执行器上看到与缺少本机库相关的任何错误。

Apache Spark使用本机依赖关系-独立模式下的驱动程序/执行程序代码流

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-05-02 05:53:27

Apache Spark使用本机依赖关系-独立模式下的驱动程序/执行程序代码流

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-05-02 05:53:27

解决方案1
4 已采纳 2016-05-02 05:53:27