简体   繁体   中英

Accessing delta lake through Pyspark on EMR notebooks

I have a query with respect to using external libraries like delta-core over AWS EMR notebooks. Currently there isn't any mechanism of installing the delta-core libraries through pypi packages. The available options include.

  1. Launching out pyspark kernel with --packages option
  2. The other option is to change the packages option in the python script through os configuration, but I don't see that it is able to download the packages and I still get import error on import delta.tables library.
  3. Third option is to download the JARs manually but it appears that there isn't any option on EMR notebooks.

Has anyone tried this out before?

  1. You can download the jars while creating EMR using bootstrap scripts.
  2. You can place the jars in s3 and pass it to pyspark with --jars option

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM