简体   繁体   English

在AWS Glus pyspark作业中从s3加载JSON

[英]Load JSON from s3 inside aws glue pyspark job

I am trying to retrieve a JSON file from an s3 bucket inside a glue pyspark script. 我正在尝试从粘合pyspark脚本内的s3存储桶中检索JSON文件。

I am running this function in the job inside aws glue: 我在aws胶内的作业中运行此功能:

def run(spark):
    s3_bucket_path = 's3://bucket/data/file.gz'

    df = spark.read.json(s3_bucket_path)
    df.show()

After this I am getting: AnalysisException: u'Path does not exist: s3://bucket/data/file.gz;' 之后,我得到:AnalysisException:u'路径不存在:s3://bucket/data/file.gz;'

I searched for this issue and did not find anything that would be similar enough to infer where is the issue. 我搜索了此问题,但没有发现任何类似的东西可以推断出问题出在哪里。 I think there might be permission issues accessing the bucket, but then the error message should be different. 我认为访问存储分区可能存在权限问题,但是错误消息应该有所不同。

Here You can Try This : 在这里您可以尝试:

    s3 = boto3.client("s3", region_name="us-west-2", aws_access_key_id=" 
        ", aws_secret_access_key="")
    jsonFile = s3.get_object(Bucket=bucket, Key=key)
    jsonObject = json.load(jsonFile["Body"])

where Key = full path to your file in bucket 其中Key = full path to your file in bucket

and use this jsonObject in spark.read.json(jsonObject) 并在spark.read.json(jsonObject)使用此jsonObject

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS Glue 作业 - 将镶木地板文件从 S3 加载到 RDS jsonb 列 - AWS Glue Job - Load parquet file from S3 to RDS jsonb column 如何使用 Glue 作业将 JSON 从 s3 转换为 CSV 文件并将其保存在同一个 s3 存储桶中 - How to convert JSON to CSV file from s3 and save it in same s3 bucket using Glue job AWS Glue Studio: - 作业运行但将空文件输出到 S3 - AWS Glue Studio: - job runs but outputs empty files to S3 AWS Glue - 将来自 GET(REST API) 请求的 Json 响应转换为 DataFrame/DyanamicFramce 并将其存储在 s3 存储桶中 - AWS Glue - Convert the Json response from GET(REST API) request to DataFrame/DyanamicFramce and store it in s3 bucket AWS Glue Pyspark,结束有条件的工作? - AWS Glue Pyspark, End a job with a condition? 我无法在胶水作业中从 S3 导入模块 - I am not able to import module from S3 in glue job AWS Glue 数据从 S3 转移到 Redshift - AWS Glue Data moving from S3 to Redshift 无法从 aws 胶水写入 s3(属性错误) - Cannot write to s3 from aws glue (attribute error) aws Glue 作业:如何在 s3 中合并多个 output.csv 文件 - aws Glue job: how to merge multiple output .csv files in s3 AWS Glue python 作业限制写入 S3 存储桶的数据量? - AWS Glue python job limits the data amount to write in S3 bucket?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM