简体   繁体   中英

AWS Glue Job Wrong boto3 Version

I'm trying to run the latest version of boto3 in an AWS Glue spark job to access methods that aren't available in the default version in Glue.

To get the default version of boto3 and verify the method I want to access isn't available I run this block of code which is all boilerplate except for my print statements:

import sys
import boto3
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

athena = boto3.client('athena')
print(boto3.__version__) # verify the default version boto3 imports
print(athena.list_table_metadata) # method I want to verify I can access in Glue

job.commit()

which returns

1.12.4

Traceback (most recent call last): File "/tmp/another_sample", line 20, in print(athena.list_table_metadata) File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 566, in getattr self. class . name , item) AttributeError: 'Athena' object has no attribute 'list_table_metadata'

Ok, as expected with an older version of boto3. Let's try and import the latest version...

I perform the following steps:

  1. Go to https://pypi.org/project/boto3/#files
  2. Download the boto3-1.17.13-py2.py3-none-any.whl file
  3. Place it in S3 location
  4. Go back to the Glue Job and under the Security configuration, script libraries, and job parameters (optional) section I update the Python library path with the S3 location from step 3
  5. Rerun block of code from above

which returns

1.17.9

Traceback (most recent call last): File "/tmp/another_sample", line 20, in print(athena.list_table_metadata) File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 566, in getattr self. class . name , item) AttributeError: 'Athena' object has no attribute 'list_table_metadata'

If I run this same script locally, which is running 1.17.9 I can find the method:

1.17.9

<bound method ClientCreator._create_api_method.._api_call of <botocore.client.Athena object at 0x7efd8a4f4710>>

Any ideas on what's going on here and how to access the methods that I would expect should be imported in the upgraded version?

Ended up finding a work-around solution in the AWS documentation .

Added the following Key/Value pair in the Glue Job parameters under the Security configuration, script libraries, and job parameters (optional) section of the job:

Key :

--additional-python-modules

Value :

botocore>=1.20.12,boto3>=1.17.12

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM