简体   繁体   中英

How to Pass Variable from EMR Cluster to Jupyter Notebook %%local Instance?

How do I use a variable defined in the EMR cluster's Python instance when I run code on the managed Jupyter notebook instance using %%local ?

Specifically I want to use matplotlib as shown in this question , and display plot from a dataframe generated using spark.sql() . Using %%sql lets me easily use data results in %%local , but I would still need to pass parameters to %%sql from the EMR Python instance

Example:

ln[1]: parameter = 'Hello parameter'

ln[2]: %%local
       print(parameter)

I keep getting error that my variable is not defined.

I found 2 workarounds

  • Use %%spark -o df to return SQL query results to a dataframe that can be used with %%local like in this answer
  • Do all query building, execution and any data processing like normal without using any %% magic commands, then write the final data to a temporary table in my database using df.createOrReplaceTempView("temp_table_name") . Then use a simple query to retrieve the final data with %%sql -q -o df and SELECT * FROM temp_table_name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM