在 Python 中使用來自 API 調用的多個 JSON

Question

我正在嘗試進行多個 API 調用來檢索 JSON 文件。 JSON 都遵循相同的模式。 我想將所有 JSON 文件合並為一個文件，這樣我可以做兩件事：

1) Extract all the IP addresses from the JSON to work with later 2) Convert the JSON into a Pandas Dataframe

當我第一次編寫代碼時，我提出了一個請求，它返回了一個我可以使用的 JSON。 現在我使用了一個 for 循環來收集多個 JSON，並將 append 收集到一個名為 results_list 的列表中，這樣下一個results_list就不會覆蓋我請求的前一個。

這是代碼

headers = {
    'Accept': 'application/json',
    'key': 'MY_API_KEY'
}



query_type = 'QUERY_TYPE'

locations_list = ['London', 'Amsterdam', 'Berlin']

results_list = []

for location in locations_list:

        url = ('https://API_URL' )

        r = requests.get(url, params={'query':str(query_type)+str(location)}, headers = headers)

        results_list.append(r)      



with open('my_search_results.json' ,'w') as outfile:
    json.dump(results_list, outfile)

JSON 文件my_search_results.json對每個 API 查詢都有一個單獨的行，例如 0 是倫敦，1 是阿姆斯特丹，2 是柏林等。

    [
    {
        "complete": true,
        "count": 51,
        "data": [
            {
                "actor": "unknown",
                "classification": "malicious",
                "cve": [],
                "first_seen": "2020-03-11",
                "ip": "1.2.3.4",
                "last_seen": "2020-03-28",
                "metadata": {
                    "asn": "xxxxx",
                    "category": "isp",
                    "city": "London",
                    "country": "United Kingdom",
                    "country_code": "GB",
                    "organization": "British Telecommunications PLC",
                    "os": "Linux 2.2-3.x",
                    "rdns": "xxxx",
                    "tor": false
                },
                "raw_data": {
                    "ja3": [],
                    "scan": [
                        {
                            "port": 23,
                            "protocol": "TCP"
                        },
                        {
                            "port": 81,
                            "protocol": "TCP"
                        }
                    ],
                    "web": {}
                },
                "seen": true,
                "spoofable": false,
                "tags": [
                    "some tag",

                ]
            }

（我已經編輯了所有敏感數據。對於每個 API 請求，JSON 中有一個單獨的行，代表每個城市，但這里太大了，無法顯示）

現在我想通過 go 通過 JSON 並挑選出所有 IP 地址：

for d in results_list['data']:
        ips = (d['ip'])
        print(ips)

然而，這給出了錯誤：

TypeError: list indices must be integers or slices, not str

When I was working with a single JSON from a single API request this worked fine, but now it seems like either the JSON is not formatted properly or Python is seeing my big JSON as a list and not a dictionary, even though I used json.dump()腳本前面的results_list上的json.dump() 。 我確信這與我必須將所有 API 調用和 append 它們列在一個列表中的方式有關，但我無法弄清楚我哪里出錯了。

我正在努力弄清楚如何挑選 IP 地址，或者是否有更好的方法來收集和合並多個 JSON。 任何建議表示贊賞。

Answer 1

要獲得 IP，請嘗試：

for d in results_list['data']: #this works only if you accessed data rightly..
        ips = (d[0]['ip'])
        print(ips)

您收到錯誤的原因：

data 的key是一個列表，其中包含您需要的ip的字典。 因此，當您嘗試通過ips = (d['ip'])訪問 ip 時，您正在索引外部列表，這會引發錯誤：

TypeError：列表索引必須是整數或切片，而不是 str

因此，如果：

results_list= [
    {
        "complete": True,
        "count": 51,
        "data": [
            {
                "actor": "unknown",
                "classification": "malicious",
                "cve": [],
                "first_seen": "2020-03-11",
                "ip": "1.2.3.4",
                "last_seen": "2020-03-28",
                "metadata": {
                    "asn": "xxxxx",
                    "category": "isp",
                    "city": "London",
                    "country": "United Kingdom",
                    "country_code": "GB",
                    "organization": "British Telecommunications PLC",
                    "os": "Linux 2.2-3.x",
                    "rdns": "xxxx",
                    "tor": False
                },
                "raw_data": {
                    "ja3": [],
                    "scan": [
                        {
                            "port": 23,
                            "protocol": "TCP"
                        },
                        {
                            "port": 81,
                            "protocol": "TCP"
                        }
                    ],
                    "web": {}
                },
                "seen": True,
                "spoofable": False,
                "tags": [
                    "some tag",

                ]
            }...(here is your rest data)
         ]}]

要獲取所有 IP 地址，請運行：

ip_address=[]
# this works only if each result is a seperate dictionary in the results_list
for d in results_list:
    ips = d['data'][0]['ip']
    ip_address.append(ips)
    print(ips)
#if all results are within data
for d in results_list[0]['data']:
    ips = d['ip']
    ip_address.append(ips)
    print(ips)

Answer 2

results_list是一個列表，而不是字典，因此results_list['data']會引發錯誤。 相反，您應該從該列表中獲取每個字典，然后訪問'data'屬性。 還要注意鍵'data'的值是列表類型，您還需要訪問該列表的元素：

for result in results_list:
    for d in result["data"]:
        ips = d["ip"]
        print(ips)

如果您知道您的 JSON 列表只有一個元素，您可以將其簡化為：

for d in results_list[0]["data"]:
    ips = d["ip"]
    print(ips)

在 Python 中使用來自 API 調用的多個 JSON

問題描述

2 個解決方案

解決方案1
1 2020-04-08 11:34:18

您收到錯誤的原因：

解決方案2
0 已采納 2020-04-08 11:36:08

在 Python 中使用來自 API 調用的多個 JSON

問題描述

2 個解決方案

解決方案1 1 2020-04-08 11:34:18

您收到錯誤的原因：

解決方案2 0 已采納 2020-04-08 11:36:08

解決方案1
1 2020-04-08 11:34:18

解決方案2
0 已采納 2020-04-08 11:36:08