简体   繁体   English

将 JSON 从 S3 加载到 Redshift

[英]Loading JSON from S3 to Redshift

I have the following JSON data in an S3 bucket:我在 S3 存储桶中有以下 JSON 数据:

{
"campaigns": [
{"campaign_reach": 123456, 
"campaign_spend": 123456.0, 
"campaign_goal": 12345678, 
"id": "cda05a432b3b44c18c009a4a961f644a", 
"campaign_name": "Campaign1", 
"publisher_name": "PublisherA", 
"campaign_impressions": 123456}], 
"line_items": [], 
"podcasts": [
{"podcast_name": "PodcastA", "id": "86edbca2dc644ba8960c8f4bd55bdc19"}, 
{"podcast_name": "PodcastB", "id": "fc3f2dc4c20949edaaf2186613ec7e47"}]
}

I am using COPY to load the "campaigns" portion to a table in Redshift.我正在使用 COPY 将“活动”部分加载到 Redshift 中的表中。

I have tried loading using jsonpaths我尝试使用 jsonpaths 加载

query_copy = """copy myschema.campaigns
from 's3://mybucket/mapping.json'
credentials 'aws_access_key_id=""" + acc + """;aws_secret_access_key=""" + sh + """'
json 's3://mybucket/campaign_jsonpaths.json'
;"""

My jsonpaths file "campaign_jsonpaths.json":我的 jsonpaths 文件“campaign_jsonpaths.json”:

{
    "jsonpaths": [
        "$['id']",
        "$['campaign_name']",
        "$['campaign_reach'][0]",
        "$['campaign_spend']",
        "$['campaign_goal']",
        "$['campaign_impressions']",
        "$['publisher_name']",
    ]
}

I have also tried using json 'auto':我也尝试过使用 json 'auto':

query_copy = """copy myschema.campaigns
from 's3://mybucket/mapping.json'
credentials 'aws_access_key_id=""" + acc + """;aws_secret_access_key=""" + sh + """'
json 'auto’
;"""

both result in successful runs, but the table in Redshift is empty.两者都导致成功运行,但 Redshift 中的表是空的。 No errors in stl_load_errors. stl_load_errors 中没有错误。

I found a similar posting here, but no answers were provided: Redshift: copy command Json data from s3我在这里找到了类似的帖子,但没有提供答案: Redshift: copy command Json data from s3

Any help would be much appreciated.任何帮助将非常感激。

I was able to load the table successfully by doing the following:通过执行以下操作,我能够成功加载表:

  1. Created campaigns table based on your JSON data:根据您的 JSON 数据创建活动表:

    create table campaigns ( id varchar(100), campaign_name varchar(100), campaign_reach int, campaign_spend float, campaign_goal int, campaign_impressions int, publisher_name varchar(100) );

  2. Created a mapping.json file with your JSON data使用您的 JSON 数据创建了一个 mapping.json 文件

  3. Created a campaigns_jsonpaths.json as follows:创建了一个campaigns_jsonpaths.json,如下所示:

    { "jsonpaths": [ "$['campaigns'][0]['id']", "$['campaigns'][0]['campaign_name']", "$['campaigns'][0]['campaign_reach']", "$['campaigns'][0]['campaign_spend']", "$['campaigns'][0]['campaign_goal']", "$['campaigns'][0]['campaign_impressions']", "$['campaigns'][0]['publisher_name']" ] }

  4. Ran copy:跑副本:

    copy campaigns from 's3://<bucket>/mapping.json' iam_role 'arn:aws:iam::1234567890:role/Redshift-Role' json 's3://<bucket>/campaigns_jsonpaths.json';

Records were loaded successfully in the campaigns table.记录已成功加载到活动表中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM