简体   繁体   English

AWS Glue psycopg2 安装

[英]AWS Glue psycopg2 installation

I'm trying to run a code that uses psycopg2 to manipulate a Redshift instance.我正在尝试运行使用 psycopg2 来操作 Redshift 实例的代码。 I have tried by importing a wheel file as I see they are supported in Glue python jobs.我已经尝试导入一个 wheel 文件,因为我看到它们在 Glue python 作业中受支持。 I see the library is installed in the endpoint when running but then I get an error:我看到库在运行时安装在端点中,但随后出现错误:

import boto3
import psycopg2
Aug 4, 2020, 1:24:06 PM Pending execution
Processing ./glue-python-libs-92ng4pcb/psycopg2-2.8.5-cp36-cp36m-win_amd64.whl
Installing collected packages: psycopg2
Successfully installed psycopg2-2.8.5
Considering file without prefix as a python extra file s3://gluelibraries/boto3.zip
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.

2020-08-04T13:24:44.831+02:00
Traceback (most recent call last):
  File "/tmp/runscript.py", line 123, in <module>
    runpy.run_path(temp_file_path, run_name='__main__')
  File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/glue-python-scripts-1t08aq9n/postloading.py", line 6, in <module>
  File "/glue/lib/installation/psycopg2/__init__.py", line 51, in <module>
    from psycopg2._psycopg import (                     # noqa
ModuleNotFoundError: No module named 'psycopg2._psycopg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/runscript.py", line 142, in <module>
    raise e_type(e_value).with_traceback(new_stack)
  File "/tmp/glue-python-scripts-1t08aq9n/postloading.py", line 6, in <module>
  File "/glue/lib/installation/psycopg2/__init__.py", line 51, in <module>
    from psycopg2._psycopg import (                     # noqa
ModuleNotFoundError: No module named 'psycopg2._psycopg'

Theoretically Glue jobs in python (contrary to pyspark jobs) should support non pure python libraries理论上 python 中的 Glue 作业(与 pyspark 作业相反)应该支持非纯 python 库

based on https://stackoverflow.com/a/58305654/4725074基于https://stackoverflow.com/a/58305654/4725074

Install psycopg2-binary into a directory and zip up the contents of that directory:将 psycopg2-binary 安装到一个目录中,然后将 zip 安装到该目录的内容中:

mkdir psycopg2-binary
cd psycopg2-binary
pip install psycopg2-binary -t  .
# in case using python3:
# python3 -m pip install --system psycopg2-binary -t  .
zip -r9 psycopg2.zip *

I then copied psycopg2.zip to an S3 bucket and add it as an extra Python library under "Python library path" in the Glue Spark job.然后,我将 psycopg2.zip 复制到 S3 存储桶,并将其作为额外的 Python 库添加到 Glue Spark 作业的“Python 库路径”下。

I then launched the job with the following script to verify if psycopg2 is present (the zip file will be downloaded by Glue into the directory in which the Job script is located)然后我使用以下脚本启动作业以验证 psycopg2 是否存在(Glue 将 zip 文件下载到作业脚本所在的目录中)

from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import sys
import os
import zipfile

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

zip_ref = zipfile.ZipFile('./psycopg2.zip', 'r')
print os.listdir('.')
zip_ref.extractall('/tmp/packages')
zip_ref.close()
sys.path.insert(0, '/tmp/packages')

import psycopg2
print(psycopg2.__version__)

job.commit()

This worked for me.这对我有用。

Now with Glue Version 2 you can pass in python libraries as parameters to Glue Jobs.现在,使用 Glue 版本 2,您可以将 python 个库作为参数传递给 Glue 作业。 I used pyscopg2-binary instead of pyscopg2 and it worked for me.我使用 pyscopg2-binary 而不是 pyscopg2,它对我有用。 Then in the code I did import psycopg2.然后在代码中我导入了 psycopg2。

--additional-python-modules --additional-python-modules

I have faced the similar issue with psycopg2 package. It is to do with the compatibility with Python runtime that is accessing the psycopg2 module.我遇到过与 psycopg2 package 类似的问题。这与访问 psycopg2 模块的 Python 运行时的兼容性有关。

Follow this thread.按照这个线程。 Hope you'll have your solution.希望你有你的解决方案。 Using psycopg2 with Lambda to Update Redshift (Python) 使用带有 Lambda 的 psycopg2 更新 Redshift (Python)

Instead of psycopg2, try using pg8000 which is easy to install and it doesn't have c dependencies.尝试使用易于安装且没有 c 依赖项的 pg8000,而不是 psycopg2。 Also, it is used by amazon in most of their internal projects.此外,亚马逊在其大部分内部项目中都使用它。

After having tried with pg8000 with a Python endpoint I got the following error:在尝试使用带有 Python 端点的 pg8000 后,我收到以下错误:

Traceback (most recent call last):
  File "/tmp/runscript.py", line 123, in <module>
    runpy.run_path(temp_file_path, run_name='__main__')
  File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/glue-python-scripts-j7khvbvv/postloading.py", line 7, in <module>
ModuleNotFoundError: No module named 'pg8000'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/runscript.py", line 142, in <module>
    raise e_type(e_value).with_traceback(new_stack)
  File "/tmp/glue-python-scripts-j7khvbvv/postloading.py", line 7, in <module>
ModuleNotFoundError: No module named 'pg8000'

when using a pyspark endpoint I don't have this problem with the pg8000使用 pyspark 端点时,pg8000 没有这个问题

I download wheel from this link with name psycopg2-2.9.1-cp36-cp36m-linux_x86_64.whl and problem was solved.我从此链接下载了名为psycopg2-2.9.1-cp36-cp36m-linux_x86_64.whl 的 wheel ,问题已解决。 Thanks谢谢

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM