[英]A python package with static file dependency fails to read static file when used within Pyspark
I am trying to resolve an issue with python packages PySpark.我正在尝试解决 Python 包 PySpark 的问题。 I developed a python package which has the following structure.我开发了一个具有以下结构的python包。
sample_package/
|-config/
|-sample.ini
|-main.py
|-__init__.py
Inside my main.py
, I have a code snippet that reads the config file from the config/
directory as follows在我的main.py
,我有一个代码片段从config/
目录中读取配置文件,如下所示
import ConfigParser, os
def sample_func():
config = ConfigParser.ConfigParser()
configfile = os.path.join(os.path.dirname(__file__), 'config', 'sample.ini')
config.read(configfile)
return config.sections()
I created a zip file of the above package as sample_package.zip
and included the zip as a pyspark dependency我创建了上述包的 zip 文件作为sample_package.zip
并将 zip 作为 pyspark 依赖项包含在内
addPyFile(path/to/zip/file)
In my pyspark job when i import the sample_package
the import works fine and i'm able to call the sample_func
inside main, but however my python package is unable to read the sample.ini
file.在我的 pyspark 作业中,当我导入sample_package
,导入工作正常,我可以在 main 中调用sample_func
,但是我的 python 包无法读取sample.ini
文件。 When executed inside a plain python program, it works fine but not inside a pyspark job.在普通的 Python 程序中执行时,它可以正常工作,但不能在 pyspark 作业中执行。 Is there any path manipulation being done in a pyspark environment when accessing static files?访问静态文件时是否在 pyspark 环境中进行了任何路径操作? How can I get my python package to properly read the config file?如何让我的 python 包正确读取配置文件?
I figured out the answer by myself.我自己想出了答案。 It is more of a python packaging issue rather than pyspark environment issue.它更像是一个 python 打包问题,而不是 pyspark 环境问题。 Looks like I had to use pkgutil
module to reference my static resources which modifies my function as below看起来我不得不使用pkgutil
模块来引用我的静态资源,它修改了我的功能,如下所示
import ConfigParser, os, pkgutil, StringIO
def sample_func():
config = ConfigParser.ConfigParser()
configfile = pkgutil.get_data('sample_package', 'config/sample.ini')
cf_buf = StringIO.StringIO(configfile)
config.readfp(cf_buf)
return config.sections()
More simple version:更简单的版本:
from configparser import ConfigParser
import pkgutil
def sample_func():
config = ConfigParser()
# os.path.join is not needed.
config_data = pkgutil.get_data(__name__, 'config/sample.ini').decode()
config.read_string(config_data)
return config.sections()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.