简体   繁体   中英

Queries on Databricks with comparison to HDInsight Cluster

I have few queries regarding Databricks implementation compared to HDInsight cluster.

  1. Currently, few python files are running from /bin/ in the HDInsight cluster. Is there a way to upload the same python files to /bin in the Databricks.

    I am considering /FileStore/tables/ in the Databricks as same as /bin and have uploaded the python files.

    Few files are getting executed but when there is a.sh script which refers to the PATH=.:PATH, it fails there saying no script found.

    When I run the command as below in the Databricks python notebook,
    %sh PATH=".:$PATH"
    echo $PATH
    it gives, .:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

    But I couldn't explicitly see those directories in the Databricks.

    My thought is explicitly defining the path in the bash script [ dbfs/FileStore/tables ] or is there a better way of doing it.

  2. In the bash script, how can I define the path explicitly where by actual scripts are present in the Databricks.

    In the HDInsight cluster, it is pointing to /bin when it executes the bash script with PATH=.:PATH.

    Is there a way in Databricks to do the same.

  3. Is there a way to change the Upload option to some other directories apart from /FileStore/tables which Databricks is defaulting to.

  4. In Databricks, is there a way to define [ like mount point to ADLS/ BLOB] mounting /FileStore/tables/ as bin and uploading all python files to that path.

  1. You can set environment variables that you can access from scripts running on a cluster.

    • On the cluster configuration page, click the Advanced Options toggle.

    • Click the Spark tab.

    • Set the environment variables in the Environment Variables field.

在此处输入图像描述

  1. You can also set environment variables using the spark_env_vars field in the Create cluster request or Edit cluster request Clusters API endpoints.

3 & 4. FileStore/jars is a special folder within Databricks File System, where you upload libraries. More details, refer " Databricks - Libraries ".

Once, the jar files are uploaded to the FileStore/jars, you can call those libraries in the init script.

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM