简体   繁体   English

使用python将多个JSON对象作为一个对象写入单个文件

[英]Writing multiple JSON objects as one object to a single file with python

I'm using python to hit a foreman API to gather some facts about all the hosts that foreman knows about. 我正在使用python命中一个领班API,以收集有关领班知道的所有主机的一些事实。 Unfortunately, there is not get-all-hosts-facts (or something similar) in the v1 foreman API, so I'm having to loop through all the hosts and get the information. 不幸的是,在v1工头API中没有获取所有主机的事实 (或类似信息),因此我必须遍历所有主机并获取信息。 Doing so has lead me to an annoying problem. 这样做导致我遇到一个烦人的问题。 Each call to a given host return a JSON object like so: 每次对给定主机的调用都会返回一个JSON对象,如下所示:

{
  "host1.com": {
    "apt_update_last_success": "1452187711", 
    "architecture": "amd64", 
    "augeasversion": "1.2.0", 
    "bios_release_date": "06/03/2015", 
    "bios_vendor": "Dell Inc."
   }
}

This is totally fine, the issue arises when I append the next host's information. 完全没问题,当我附加下一个主机的信息时就会出现问题。 I then get a json file that looks something like this: 然后,我得到一个看起来像这样的json文件:

{
  "host1.com": {
    "apt_update_last_success": "1452187711", 
    "architecture": "amd64", 
    "augeasversion": "1.2.0", 
    "bios_release_date": "06/03/2015", 
    "bios_vendor": "Dell Inc."
}
}{
"host2.com": {
    "apt_update_last_success": "1452703454", 
    "architecture": "amd64", 
    "augeasversion": "1.2.0", 
    "bios_release_date": "06/03/2015", 
    "bios_vendor": "Dell Inc."
   }
}

Here's the code that's doing this: 这是执行此操作的代码:

for i in hosts_data:
    log.info("Gathering host facts for host: {}".format(i['host']['name']))
    try:
        facts = requests.get(foreman_host+api+"hosts/{}/facts".format(i['host']['id']), auth=(username, password))
        if hosts.status_code != 200:
            log.error("Unable to connect to Foreman! Got retcode '{}' and error message '{}'"
            .format(hosts.status_code, hosts.text))
            sys.exit(1)
    except requests.exceptions.RequestException as e:
        log.error(e)
    facts_data = json.loads(facts.text)
    log.debug(facts_data)
    with open(results_file, 'a') as f:
        f.write(json.dumps(facts_data, sort_keys=True, indent=4))

Here's what I need the file to look like: 这是我需要的文件外观:

{
"host1.com": {
    "apt_update_last_success": "1452187711",
    "architecture": "amd64",
    "augeasversion": "1.2.0",
    "bios_release_date": "06/03/2015",
    "bios_vendor": "Dell Inc."
},
"host2.com": {
    "apt_update_last_success": "1452703454",
    "architecture": "amd64",
    "augeasversion": "1.2.0",
    "bios_release_date": "06/03/2015",
    "bios_vendor": "Dell Inc."
  }
}

It would be better to assemble all of your data into one dict and then write it all out one time, instead of each time in the loop. 最好将所有数据组合成一个字典,然后一次全部写入,而不是循环中每次写入。

d = {}
for i in hosts_data:
    log.info("Gathering host facts for host: {}".format(i['host']['name']))
    try:
        facts = requests.get(foreman_host+api+"hosts/{}/facts".format(i['host']['id']), auth=(username, password))
        if hosts.status_code != 200:
            log.error("Unable to connect to Foreman! Got retcode '{}' and error message '{}'"
            .format(hosts.status_code, hosts.text))
            sys.exit(1)
    except requests.exceptions.RequestException as e:
        log.error(e)
    facts_data = json.loads(facts.text)
    log.debug(facts_data)
    d.update(facts_data)  #add to dict
# write everything at the end
with open(results_file, 'a') as f:
    f.write(json.dumps(d, sort_keys=True, indent=4))

Instead of writing json inside the loop, insert the data into a dict with the correct structure. 与其在循环内写入json,不如将数据插入具有正确结构的dict Then write that dict to json when the loop is finished. 然后在循环结束时将该字典写入json。

This assumes your dataset fit into memory. 假设您的数据集适合内存。

For safety/consistency, you need to load in the old data, mutate it, then write it back out. 为了安全/一致,您需要加载旧数据,对其进行变异,然后将其写回。

Change the current with and write to: 改变当前withwrite到:

# If file guaranteed to exist, can use r+ and avoid initial seek
with open(results_file, 'a+') as f:
    f.seek(0)
    combined_facts = json.load(f)
    combined_facts.update(facts_data)
    f.seek(0)
    json.dump(combined_facts, f, sort_keys=True, indent=4)
    f.truncate()  # In case new JSON encoding smaller, e.g. due to replaced key

Note: If possible, you want to use pault's answer to minimize unnecessary I/O, this is just how you'd do it if the data retrieval should be done piecemeal, with immediate updates for each item as it becomes available. 注意:如果可能的话,您想使用保管箱的答案来最大程度地减少不必要的I / O,这就是如果要零碎地进行数据检索,并且每项可用时立即进行更新的方法。

FYI, the unsafe way is to basically find the trailing curly brace, delete it, then write out a comma followed by the new JSON (removing the leading curly brace from it's JSON representation). 仅供参考,不安全的方法是基本上找到尾随的花括号,将其删除,然后写一个逗号,后跟新的JSON(从其JSON表示中删除前导的花括号)。 It's much less I/O intensive, but it's also less safe, doesn't clean out duplicates, doesn't sort the hosts, doesn't validate the input file at all, etc. So don't do it. 它的I / O密集程度要低得多,但是它也不安全,不清除重复项,不对主机进行排序,根本不验证输入文件等。因此,请不要这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM