简体   繁体   中英

How do you pull sub sections of a JSON string into a table in BigQuery

(I am working in Python in Visual Studio Code and trying to input into Google BigQuery)

Hello,

I have pulled data from an API, and I have all this data in JSON that I dumped into a string (using json.dumps). I have another API request that posts this data into a table I have made in BigQuery. I want to isolate the information from the JSON string and put selected data from each subsection of the JSON string and put it in the corresponding table in BigQuery, but can't seem to figure out how. There are multiple value in each subsection, but I only want some of them in the BigQuery table rather than all of them.

response = requests.get(api_url, headers= headers)
output = response.json()
output_dumps = json.dumps(output, indent=4)

This makes my API request to my target system and pulls the response into a JSON object, which I then convert to a string using "json.dumps(output, indent=4)". This give me the following (sample data):

"results": [
    {
        "uuid": "sampleuniqueid1",
        "account": {
            "uuid": "abc",
            "code": "xyz"
        },
        "flowCode": {
            "uuid": "mnm",
            "code": "nmn"
        },
        "budgetCode": {
            "uuid": ghg,
            "code": hgh
        },
        "date": {
            "transactionDate": "1999-01-01",
            "valueDate": "1999-01-01",
            "accountingDate": "1999-01-01",
            "updateDateTime": "1999-01-01T19:19:19Z"
        },
        "flowAmount": {
            "currency": {
                "uuid": "jkj",
                "code": "kjk"
            },
            "amount": 99999999
        },
        "accountAmount": {
            "currency": {
                "uuid": "tbt",
                "code": "btb"
            },
            "amount": 99999999
        },
        "description": null,
        "reference": null,
        "origin": "null",
        "number": 19,
        "glStatus": "null",
        "userZones": null,
        "actualMode": "null",
        "status": "null"
    },
    {
        "uuid": "sampleuniqueid2",
        "account": {
            "uuid": "rer",
            "code": "ere"
        },
        "flowCode": {
            "uuid": "lkl",
            "code": "klk"
        },
        "budgetCode": {
            "uuid": pop,
            "code": opo
        },
        "date": {
            "transactionDate": "1999-01-01",
            "valueDate": "1999-01-01",
            "accountingDate": "1999-01-01",
            "updateDateTime": "1999-01-01T19:19:19Z"
        },
        "flowAmount": {
            "currency": {
                "uuid": "jkj",
                "code": "kjk"
            },
            "amount": 8888888
        },
        "accountAmount": {
            "currency": {
                "uuid": "tbt",
                "code": "btb"
            },
            "amount": 8888888
        },
        "description": null,
        "reference": null,
        "origin": "null",
        "number": 18,
        "glStatus": "null",
        "userZones": null,
        "actualMode": "null",
        "status": "null"
    },
    {
        "uuid": "sampleuniqueid3",
        "account": {
            "uuid": "fhf",
            "code": "hfh"
        },
        "flowCode": {
            "uuid": "wew",
            "code": "ewe"
        },
        "budgetCode": {
            "uuid": pop,
            "code": opo
        },
        "date": {
            "transactionDate": "1999-01-01",
            "valueDate": "1999-01-01",
            "accountingDate": "1999-01-01",
            "updateDateTime": "1999-01-01T19:19:19Z"
        },
        "flowAmount": {
            "currency": {
                "uuid": "bvb",
                "code": "vbv"
            },
            "amount": 777777777
        },
        "accountAmount": {
            "currency": {
                "uuid": "aka",
                "code": "kak"
            },
            "amount": 777777777
        },
        "description": null,
        "reference": null,
        "origin": "null",
        "number": 117,
        "glStatus": "null",
        "userZones": null,
        "actualMode": "null",
        "status": "null"
    },
]

As you can see, there are multiple values within each subsection (uuid, code, etc). In BigQuery, I have tables staged as:

{u'uuid':'devtest1', u'account_code':'devtest1', u'flow_code':'devtest1', u'budget_code':'devtest1', 
        u'transactionDate':'devtest1', u'valueDate':'devtest1', u'accountingDate':'devtest1', u'updateDateTime':'devtest1', 
        u'flow_currency_code':'devtest1', u'flow_amount':'999.99', u'account_currency_code':'devtest1', u'account_amount':'999.999', 
        u'description':'devtest1', u'reference':'devtest1', u'origin':'devtest1', u'number':'devtest1', 
        u'glStatus':'devtest1', u'userZone1':'devtest1', u'userZone2':'devtest1', u'userZone3':'devtest1', 
        u'userZone4':'devtest1', u'userZone5':'devtest1', u'actualMode':'devtest1', u'status':'devtest1'}

I have written another API that sends this data into BigQuery. The table columns have already been named, now all I have to do is populate it with the correct data from the JSON info I got from my system. devtest is a hardcoded testing value, but I want the devtests to be replaced with code that will pull the correct corresponding information from the information from my API for each entry within the results section.

I also don't need every value given in the JSON. For example, I need the 'code' from the budgetCode section, but not the 'uuid'. Additionally, I have multiple different entries in the results section. In this API request, I got back 3 full results, but I plan on pulling hundreds at a time, so how would I make sure each entry (in the JSON, each entry is called "sampleuniqueid1", "sampleuniqueid2", "sampleuniqueid3") gets its own row when inserting it into BigQuery.

Using your sample json and expected json output. I transformed your initial JSON to your expected data in BQ. This is assuming that your fields don't change. See code below:

json_data = '''
{
        "results": [{
                        "uuid": "sampleuniqueid1",
                        "account": {
                                "uuid": "abc",
                                "code": "xyz"
                        },
                        "flowCode": {
                                "uuid": "mnm",
                                "code": "nmn"
                        },
                        "budgetCode": {
                                "uuid": "ghg",
                                "code": "hgh"
                        },
                        "date": {
                                "transactionDate": "1999-01-01",
                                "valueDate": "1999-01-01",
                                "accountingDate": "1999-01-01",
                                "updateDateTime": "1999-01-01T19:19:19Z"
                        },
                        "flowAmount": {
                                "currency": {
                                        "uuid": "jkj",
                                        "code": "kjk"
                                },
                                "amount": 99999999
                        },
                        "accountAmount": {
                                "currency": {
                                        "uuid": "tbt",
                                        "code": "btb"
                                },
                                "amount": 99999999
                        },
                        "description": null,
                        "reference": null,
                        "origin": "null",
                        "number": 19,
                        "glStatus": "null",
                        "userZones": null,
                        "actualMode": "null",
                        "status": "null"
                },
                {
                        "uuid": "sampleuniqueid2",
                        "account": {
                                "uuid": "rer",
                                "code": "ere"
                        },
                        "flowCode": {
                                "uuid": "lkl",
                                "code": "klk"
                        },
                        "budgetCode": {
                                "uuid": "pop",
                                "code": "opo"
                        },
                        "date": {
                                "transactionDate": "1999-01-01",
                                "valueDate": "1999-01-01",
                                "accountingDate": "1999-01-01",
                                "updateDateTime": "1999-01-01T19:19:19Z"
                        },
                        "flowAmount": {
                                "currency": {
                                        "uuid": "jkj",
                                        "code": "kjk"
                                },
                                "amount": 8888888
                        },
                        "accountAmount": {
                                "currency": {
                                        "uuid": "tbt",
                                        "code": "btb"
                                },
                                "amount": 8888888
                        },
                        "description": null,
                        "reference": null,
                        "origin": "null",
                        "number": 18,
                        "glStatus": "null",
                        "userZones": null,
                        "actualMode": "null",
                        "status": "null"
                },
                {
                        "uuid": "sampleuniqueid3",
                        "account": {
                                "uuid": "fhf",
                                "code": "hfh"
                        },
                        "flowCode": {
                                "uuid": "wew",
                                "code": "ewe"
                        },
                        "budgetCode": {
                                "uuid": "pop",
                                "code": "opo"
                        },
                        "date": {
                                "transactionDate": "1999-01-01",
                                "valueDate": "1999-01-01",
                                "accountingDate": "1999-01-01",
                                "updateDateTime": "1999-01-01T19:19:19Z"
                        },
                        "flowAmount": {
                                "currency": {
                                        "uuid": "bvb",
                                        "code": "vbv"
                                },
                                "amount": 777777777
                        },
                        "accountAmount": {
                                "currency": {
                                        "uuid": "aka",
                                        "code": "kak"
                                },
                                "amount": 777777777
                        },
                        "description": null,
                        "reference": null,
                        "origin": "null",
                        "number": 117,
                        "glStatus": "null",
                        "userZones": null,
                        "actualMode": "null",
                        "status": "null"
                }
        ]
}
'''

import json
from io import StringIO

data_dict = json.loads(json_data)
entry_dict = {}
entry_arr = []
results = data_dict["results"]
for _ in results:
    entry_dict = {
    "uuid" : _["uuid"],
    "account_code": _["account"]["code"],
    "flow_code": _["flowCode"]["code"],
    "budget_code": _["budgetCode"]["code"],
    "transactionDate": _["date"]["transactionDate"],
    "valueDate": _["date"]["valueDate"],
    "accountingDate": _["date"]["accountingDate"],
    "updateDateTime": _["date"]["updateDateTime"],
    "flow_currency_code": _["flowAmount"]["currency"]["code"],
    "flow_amount": _["flowAmount"]["amount"],
    "account_currency_code": _["accountAmount"]["currency"]["code"],
    "account_amount": _["accountAmount"]["amount"],
    "description": _["description"],
    "reference": _["reference"],
    "origin": _["origin"],
    "number": _["number"],
    "glStatus": _["glStatus"],
    "userZones": _["userZones"],
    "actualMode": _["actualMode"],
    "status": _["status"]
    }
    entry_arr.append(entry_dict)

#you use BQ API to load json with entry_arr
#your BQ code here
#for testing I created a file and loaded it manually
#hence code below

#convert to newline delimited json
json_string = json.dumps(entry_arr)
result = [json.dumps(record) for record in json.loads(json_string)]
ndjson_data = '\n'.join(result)

#write to file
f = open("data.ndjson", "a")
f.write(ndjson_data)
f.close()

Snippet of data in BQ:

在此处输入图像描述

NOTE: I manually loaded the generated newline delimited just to test if the transformed data has no errors when loading to BQ. You can insert your BQ code after the json is parsed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM