简体   繁体   中英

Unable to import pyspark to python

I have pyspark script(p1) which have dataframes created and returns dataframe. the same is being imported into different python script(p2). when i run p1 directly script executes successfully however when i run p2 it failes saying "no module found p1". I have import p1 into p2 script.

Please advise.

Pass the python script using --py-files argument

  • If you are using pyspark repl add the conf py-files and pass the path to your file.
  • if you're submitting the job using spark-submit then add --py-files argument

Sorry, i was not clear on what i have been doing in the above message.

p1:

import... 
spark = SparkSession.() 
def func(query) 
    df = spark.sql(query) 
    return df

p2:
import...
import p1
df2 = p1.func('select * from tab')
df2.show()

Then running p2 as python3 p2.py -- it says Module p1 not found

When i run p1 itself it works

I removed spark related commands and put few python functions such as print('abcd') it works fine. So i am missing something to import related to pyspark.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM