简体   繁体   English

用嵌套的dict键值对创建包含项的列表

[英]Create a list with items from a nested dict key value pair

I would like to create a new list with items from a large nested dict. 我想创建一个新列表,其中包含来自大型嵌套字典的项目。

Here is a snippet of the nested dict: 这是嵌套字典的一个片段:

AcceptedAnswersPython_combined.json AcceptedAnswersPython_combined.json

{
  "items": [
    {
      "answers": [
        {
          "creation_date": 1533083368,
          "is_accepted": false
        },
        {
          "creation_date": 1533083567,
          "is_accepted": false
        },
        {
          "creation_date": 1533083754,
          "is_accepted": true
        },
        {
          "creation_date": 1533084669,
          "is_accepted": false
        },
        {
          "creation_date": 1533089107,
          "is_accepted": false
        }
      ],
      "creation_date": 1533083248,
      "tags": [
        "python",
        "pandas",
        "dataframe"
      ]
    },
    {
      "answers": [
        {
          "creation_date": 1533084137,
          "is_accepted": true
        }
      ],
      "creation_date": 1533083367,
      "tags": [
        "python",
        "binary-search-tree"
      ]
    }
  ]
} 

The new list should contain the creation_date of each item as many times as there are dicts inside the answers list. 新列表应包含每个项目的creation_date ,次数应与answers列表中的字典次数相同。 So in case of the code snippet above the new list should look like this: 因此,如果新列表上方的代码段看起来像这样:

question_date_per_answer = [[1533083248, 1533083248, 1533083248 , 1533083248, 1533083248], [1533083367]]

The reason why I need this new list is that I would like to determine the difference between each answers creation_date and its associated question creation_date (stated inside the each items dict). 我需要这个新列表的原因是,我想确定每个answers creation_date及其关联的问题creation_date (在每个items dict中表示)之间的差异。

This new list should look like this in pandas Dataframe: 这个新列表在pandas Dataframe中看起来应该像这样:

     question creation date answer creation date  
0          1533083248             1533083368               
1          1533083248             1533083567               
2          1533083248             1533083754                
3          1533083248             1533084669               
4          1533083248             1533089107               
5          1533083367             1533084137

I can iterate through all question like so: 我可以像这样遍历所有问题:

items = json.load(open('AcceptedAnswersPython_combined.json'))['items']
question_creation_date = [item['creation_date'] for item in items]

But this leaves me with a list which is unequal to the number of answers creation_date . 但是,这给我留下的清单与answers creation_date的数量不相等。

I can't get my head around this. 我无法解决这个问题。
So how do I create such a list where the amount of question creation dates is equal to the amount of answer creation dates? 那么,如何创建这样一个列表,其中问题创建日期的数量等于答案创建日期的数量? (like question_date_per_answer ) (如question_date_per_answer

Thanks in advance. 提前致谢。

you need to iterate over item["answers"] and then get creation_date for each answer in oreder to get answer creation dates. 您需要遍历item [“ answers”],然后为oreder中的每个答案获取creation_date以获取答案创建日期。

my_json = """{
"items": [
    {
    "answers": [
        {
        "creation_date": 1533083368,
        "is_accepted": false
        },
        {
        "creation_date": 1533083567,
        "is_accepted": false
        },
        {
        "creation_date": 1533083754,
        "is_accepted": true
        },
        {
        "creation_date": 1533084669,
        "is_accepted": false
        },
        {
        "creation_date": 1533089107,
        "is_accepted": false
        }
    ],
    "creation_date": 1533083248,
    "tags": [
        "python",
        "pandas",
        "dataframe"
    ]
    },
    {
    "answers": [
        {
        "creation_date": 1533084137,
        "is_accepted": true
        }
    ],
    "creation_date": 1533083367,
    "tags": [
        "python",
        "binary-search-tree"
    ]
    }
]
}"""

import json

data = json.loads(my_json)
dates = [(question["creation_date"], answer["creation_date"])
         for question in data["items"] for answer in question["answers"]]
print(dates)

You can still work with the list at hand. 您仍然可以使用列表。
Lets try making a dataframe from the list that you already have- 让我们尝试从您已经拥有的列表中制作一个数据框-

l = [[1533083248, 1533083248, 1533083248 , 1533083248, 1533083248], [1533083367]]
df = pd.DataFrame(l)

Unfortunately you get the following- 不幸的是,您得到以下信息-

0   1   2   3   4
0   1533083248  1.533083e+09    1.533083e+09    1.533083e+09    1.533083e+09
1   1533083367  NaN     NaN     NaN     NaN

So we need to transpose it. 所以我们需要转置它。 For that lets do the following - 为此,请执行以下操作-

from itertools import zip_longest
k = list(list(zip_longest(*l))) #Unless the list will be truncated to the length of shortest list.
df = pd.DataFrame(k)

Output- 输出 -

0   1
0   1533083248  1.533083e+09
1   1533083248  NaN
2   1533083248  NaN
3   1533083248  NaN
4   1533083248  NaN

Now we will forward fill the NaNs with the previous value by - df.fillna(method='ffill') 现在,我们将通过df.fillna(method='ffill')用先前的值来填充NaN。
Whole snippet - 整个代码段-

from itertools import zip_longest
l=[1533083248, 1533083248, 1533083248 , 1533083248, 1533083248], [1533083367]
k=list(list(zip_longest(*l)))
df = pd.DataFrame(k)
df.fillna(method='ffill')

Voila - 瞧-

    0   1
0   1533083248  1.533083e+09
1   1533083248  1.533083e+09
2   1533083248  1.533083e+09
3   1533083248  1.533083e+09
4   1533083248  1.533083e+09

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从嵌套字典列表中的字典中删除键值对 - Delete key:value pair from dict in list in nested dict 如何从具有嵌套字典列表的字典中提取特定键值对 - how to extract a specific key value pair from a dict with a nested list of dicts 提取键值对并转置嵌套字典 - Extracting key value pair and transpose nested dict 从列表中划分字母数字词并将其存储为字典的键值对 - Divide alphanumeric word from a list and store as a key value pair of a dict 如何基于Python中的匹配键将键值对添加到另一个字典列表中的现有字典列表中 - How to append key value pair to an existing list of dict from another list of dict based on matching Key in Python 将 YAML dict 转换为键/值对列表 - Converting YAML dict into key/value pair list 将dict从键/值对扩展到Python中的键/值对 - Extend dict from key/values pair to key/value pair in Python 带有嵌套字典的列表中的数据帧,其中第一个字典的键是列和键,第二个字典的值是行和值 - DataFrame from list with nested dicts where key of first dict is column and key, value of second dict is row and value Python:根据键:值对的字符串创建嵌套字典 - Python: create a nested dict from strings of key:value pairs 从一个字典中检索一个键值对作为另一个字典 - Retrieve a key-value pair from a dict as another dict
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM