簡體   English   中英

如何從 URL 解析 JSON 並下載 CSV 文件?

[英]How to parse JSON from URL and download CSV files?

我得到了一個包含一些 JSON 文本的 URL。 在文本中有 csv 文件的 URL。 我正在嘗試從 URL 解析 JSON 並下載 CSV 文件。 我能夠從 URL 打印出 JSON,但不知道如何從內部獲取 CSV 文件。

import urllib, json
import urllib.request
with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
    s = url.read()
print(s)

上面將從 URL 打印 JSON(見下面的打印輸出),有 csv 文件的 URL,然后我需要使用 python 下載。

{"id":68,"name":"Carrier Rates","state":"complete","user_id":166,"data_set_id":7,"bounding_date":{"id":101,"start_date":"2019-01-01T00:00:00.000-05:00","end_date":"2999-12-31T00:00:00.000-05:00","bounding_field_id":322,"related_id":68,"related_type":"Reports::Report"},"results":[{"id":68,"created_at":"2019-07-26T15:29:40.872-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.1dec2e6d-0c36-44b7-ab26-fd43fe710daf.csv"},{"id":67,"created_at":"2019-07-26T15:29:07.112-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.3b02195e-c0a2-4abe-88f7-27d20ac76e07.csv"},{"id":35,"created_at":"2019-06-26T11:01:26.900-04:00","version_name":"06/26/2019 11:01AM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.a488c58d-5e04-4c28-a429-7167e9e8edaa.csv"},{"id":34,"created_at":"2019-06-26T10:57:51.396-04:00","version_name":"06/26/2019 10:57AM","content":"https://cloudtestlogistics-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.bf73db19-5604-4a1d-bc31-da6cf25742cc.csv"}]}
import json
from collections import namedtuple

#This is your "s"  -- data = s
data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'

# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id

這個答案來自: How to convert JSON data into a Python object將 Json 加載到對象中。 現在通過它在 json 中傳遞的密鑰訪問它。

print x.content

當然,您必須調整代碼才能使其完全按照您的意願工作。 我不是真正的 Python 專家,也沒有什么可測試的。 但我們的想法是將它加載到一個 Tuple 對象中並通過密鑰訪問它。

import urllib, json
import urllib.request
with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
    s = url.read()

# assuming here you got that json content
s='{"id":68,"name":"Carrier Rates","state":"complete","user_id":166,"data_set_id":7,"bounding_date":{"id":101,"start_date":"2019-01-01T00:00:00.000-05:00","end_date":"2999-12-31T00:00:00.000-05:00","bounding_field_id":322,"related_id":68,"related_type":"Reports::Report"},"results":[{"id":68,"created_at":"2019-07-26T15:29:40.872-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.1dec2e6d-0c36-44b7-ab26-fd43fe710daf.csv"},{"id":67,"created_at":"2019-07-26T15:29:07.112-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.3b02195e-c0a2-4abe-88f7-27d20ac76e07.csv"},{"id":35,"created_at":"2019-06-26T11:01:26.900-04:00","version_name":"06/26/2019 11:01AM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.a488c58d-5e04-4c28-a429-7167e9e8edaa.csv"},{"id":34,"created_at":"2019-06-26T10:57:51.396-04:00","version_name":"06/26/2019 10:57AM","content":"https://cloudtestlogistics-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.bf73db19-5604-4a1d-bc31-da6cf25742cc.csv"}]}'

d=json.loads(s)

for f in d['results']:
    # manage download here
    csv_url= f['content']

以下代碼可以幫助您。

    import json
    import urllib.request

    with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
    s = url.read()
    loadJson = json.load(s)
    results = loadJson["results"]
    csvLinks = []
    for object in results:
        csvlinks.append(object["content"])

現在您有一個指向 CSV 文件的鏈接列表。 使用 urllib 下載它們。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM