简体   繁体   English

EMR笔记本上通过Pyspark访问delta Lake

[英]Accessing delta lake through Pyspark on EMR notebooks

I have a query with respect to using external libraries like delta-core over AWS EMR notebooks.我有一个关于在 AWS EMR 笔记本上使用 delta-core 等外部库的问题。 Currently there isn't any mechanism of installing the delta-core libraries through pypi packages.目前没有任何通过 pypi 包安装 delta-core 库的机制。 The available options include.可用的选项包括。

  1. Launching out pyspark kernel with --packages option使用 --packages 选项启动 pyspark kernel
  2. The other option is to change the packages option in the python script through os configuration, but I don't see that it is able to download the packages and I still get import error on import delta.tables library.另一个选项是通过操作系统配置更改 python 脚本中的包选项,但我看不到它能够下载包,并且在导入 delta.tables 库时仍然出现导入错误。
  3. Third option is to download the JARs manually but it appears that there isn't any option on EMR notebooks.第三种选择是手动下载 JARs 但似乎 EMR 笔记本上没有任何选项。

Has anyone tried this out before?有没有人试过这个?

  1. You can download the jars while creating EMR using bootstrap scripts.您可以在使用引导脚本创建 EMR 时下载 jars。
  2. You can place the jars in s3 and pass it to pyspark with --jars option您可以将 jars 放在 s3 中,并使用 --jars 选项将其传递给 pyspark

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM