简体   繁体   English

Apache Spark使用本机依赖关系-独立模式下的驱动程序/执行程序代码流

[英]Apache spark using native dependencies - driver/executor code flow in standalone mode

I have setup spark in standalone mode (single node in my laptop), tried to integrate opencv to read a set of images from a directory and detect faces in each image. 我以独立模式(笔记本电脑中的单个节点)设置了spark,试图集成opencv以从目录中读取一组图像并检测每个图像中的面孔。 I am trying to understand how the native dependencies are shipped to executor jvm, I would have thought that in the given below program, System.loadLibrary function would get executed as part of driver jvm, and the executor jvm would fail when the anonymous function tries to find the native library. 我试图了解如何将本机依赖项传递给执行器jvm,我会认为在给定的以下程序中,System.loadLibrary函数将作为驱动器jvm的一部分被执行,而当匿名函数尝试执行时,执行器jvm将会失败查找本机库。 But contrary to my understanding, the program works fine. 但是与我的理解相反,该程序运行良好。 Can someone explain how this works and what part of code is shipped from driver to executor. 有人可以解释一下它是如何工作的,以及代码的哪一部分从驱动程序传送到执行程序。

 public static void main( String[] args )
 {
    SparkConf conf = new SparkConf().setMaster("spark://localhost:7077").setAppName("Image detect App");
    JavaSparkContext sc = new JavaSparkContext(conf);
    System.loadLibrary(Core.NATIVE_LIBRARY_NAME);

    CascadeClassifier faceDetector = new CascadeClassifier("/home/xx/Project/opencv-3.1.0/data/haarcascades_cuda/haarcascade_frontalface_alt.xml");

    File tempDir = new File("/home/xx/images/new");
    String tempDirName = tempDir.getAbsolutePath();

    JavaPairRDD<String, PortableDataStream> readRDD = sc.binaryFiles(tempDirName,3);
    List<Tuple2<String, PortableDataStream>> result = readRDD.collect();
    for (Tuple2<String, PortableDataStream> res : result) 
    {
        Mat image = Imgcodecs
                .imread(res._1().replace("file:",""));

        MatOfRect faceDetections = new MatOfRect();
        faceDetector.detectMultiScale(image, faceDetections);

        for (Rect rect : faceDetections.toArray()) {
            Imgproc.rectangle(image, new Point(rect.x, rect.y), new Point(rect.x + rect.width, rect.y + rect.height),
                    new Scalar(0, 255, 0));
        }
        String filename = res._1().replace("file:","") + "_out";
        Imgcodecs.imwrite(filename, image);
    } 

} Have created a jar with the above program and ran the following spark submit command, it works fine as expected. }使用上述程序创建了一个jar,并运行了以下spark Submit命令,它可以按预期运行。

./bin/spark-submit --verbose --master spark://localhost:7077 --num-executors 2 --class com.xxx.MainSparkImage --jars /home/xx/Project/opencv-3.1.0/release/bin/opencv-310.jar --driver-library-path /home/xx/Project/opencv-3.1.0/release/lib /home/xx/ImageProcess.jar ./bin/spark-submit --verbose --master spark:// localhost:7077 --num-executors 2 --class com.xxx.MainSparkImage --jars /home/xx/Project/opencv-3.1.0/ release / bin / opencv-310.jar --driver-library-path /home/xx/Project/opencv-3.1.0/release/lib /home/xx/ImageProcess.jar

Thanks Srivatsan 谢谢Srivatsan

List<Tuple2<String, PortableDataStream>> result = readRDD.collect(); 

This line would cause the RDD to be collected back to the driver as a local collection. 此行将导致RDD作为本地集合被收集回驱动程序。 The rest of the code (for loop) executes locally within the driver. 其余代码(for循环)在驱动程序内本地执行。 Hence you don't see any errors related to missing native libraries on the executors. 因此,您不会在执行器上看到与缺少本机库相关的任何错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM