用嵌套的dict键值对创建包含项的列表

Question

I would like to create a new list with items from a large nested dict. 我想创建一个新列表，其中包含来自大型嵌套字典的项目。

Here is a snippet of the nested dict: 这是嵌套字典的一个片段：

AcceptedAnswersPython_combined.json AcceptedAnswersPython_combined.json

{
  "items": [
    {
      "answers": [
        {
          "creation_date": 1533083368,
          "is_accepted": false
        },
        {
          "creation_date": 1533083567,
          "is_accepted": false
        },
        {
          "creation_date": 1533083754,
          "is_accepted": true
        },
        {
          "creation_date": 1533084669,
          "is_accepted": false
        },
        {
          "creation_date": 1533089107,
          "is_accepted": false
        }
      ],
      "creation_date": 1533083248,
      "tags": [
        "python",
        "pandas",
        "dataframe"
      ]
    },
    {
      "answers": [
        {
          "creation_date": 1533084137,
          "is_accepted": true
        }
      ],
      "creation_date": 1533083367,
      "tags": [
        "python",
        "binary-search-tree"
      ]
    }
  ]
}

The new list should contain the creation_date of each item as many times as there are dicts inside the answers list. 新列表应包含每个项目的creation_date ，次数应与answers列表中的字典次数相同。 So in case of the code snippet above the new list should look like this: 因此，如果新列表上方的代码段看起来像这样：

question_date_per_answer = [[1533083248, 1533083248, 1533083248 , 1533083248, 1533083248], [1533083367]]

The reason why I need this new list is that I would like to determine the difference between each answers creation_date and its associated question creation_date (stated inside the each items dict). 我需要这个新列表的原因是，我想确定每个answers creation_date及其关联的问题creation_date （在每个items dict中表示）之间的差异。

This new list should look like this in pandas Dataframe: 这个新列表在pandas Dataframe中看起来应该像这样：

     question creation date answer creation date  
0          1533083248             1533083368               
1          1533083248             1533083567               
2          1533083248             1533083754                
3          1533083248             1533084669               
4          1533083248             1533089107               
5          1533083367             1533084137

I can iterate through all question like so: 我可以像这样遍历所有问题：

items = json.load(open('AcceptedAnswersPython_combined.json'))['items']
question_creation_date = [item['creation_date'] for item in items]

But this leaves me with a list which is unequal to the number of answers creation_date . 但是，这给我留下的清单与answers creation_date的数量不相等。

I can't get my head around this. 我无法解决这个问题。
So how do I create such a list where the amount of question creation dates is equal to the amount of answer creation dates? 那么，如何创建这样一个列表，其中问题创建日期的数量等于答案创建日期的数量？ (like question_date_per_answer ) （如question_date_per_answer ）

Thanks in advance. 提前致谢。

Answer 1

you need to iterate over item["answers"] and then get creation_date for each answer in oreder to get answer creation dates. 您需要遍历item [“ answers”]，然后为oreder中的每个答案获取creation_date以获取答案创建日期。

my_json = """{
"items": [
    {
    "answers": [
        {
        "creation_date": 1533083368,
        "is_accepted": false
        },
        {
        "creation_date": 1533083567,
        "is_accepted": false
        },
        {
        "creation_date": 1533083754,
        "is_accepted": true
        },
        {
        "creation_date": 1533084669,
        "is_accepted": false
        },
        {
        "creation_date": 1533089107,
        "is_accepted": false
        }
    ],
    "creation_date": 1533083248,
    "tags": [
        "python",
        "pandas",
        "dataframe"
    ]
    },
    {
    "answers": [
        {
        "creation_date": 1533084137,
        "is_accepted": true
        }
    ],
    "creation_date": 1533083367,
    "tags": [
        "python",
        "binary-search-tree"
    ]
    }
]
}"""

import json

data = json.loads(my_json)
dates = [(question["creation_date"], answer["creation_date"])
         for question in data["items"] for answer in question["answers"]]
print(dates)

Answer 2

You can still work with the list at hand. 您仍然可以使用列表。
Lets try making a dataframe from the list that you already have- 让我们尝试从您已经拥有的列表中制作一个数据框-

l = [[1533083248, 1533083248, 1533083248 , 1533083248, 1533083248], [1533083367]]
df = pd.DataFrame(l)

Unfortunately you get the following- 不幸的是，您得到以下信息-

0   1   2   3   4
0   1533083248  1.533083e+09    1.533083e+09    1.533083e+09    1.533083e+09
1   1533083367  NaN     NaN     NaN     NaN

So we need to transpose it. 所以我们需要转置它。 For that lets do the following - 为此，请执行以下操作-

from itertools import zip_longest
k = list(list(zip_longest(*l))) #Unless the list will be truncated to the length of shortest list.
df = pd.DataFrame(k)

Output- 输出 -

0   1
0   1533083248  1.533083e+09
1   1533083248  NaN
2   1533083248  NaN
3   1533083248  NaN
4   1533083248  NaN

Now we will forward fill the NaNs with the previous value by - df.fillna(method='ffill') 现在，我们将通过df.fillna(method='ffill')用先前的值来填充NaN。
Whole snippet - 整个代码段-

from itertools import zip_longest
l=[1533083248, 1533083248, 1533083248 , 1533083248, 1533083248], [1533083367]
k=list(list(zip_longest(*l)))
df = pd.DataFrame(k)
df.fillna(method='ffill')

Voila - 瞧-

    0   1
0   1533083248  1.533083e+09
1   1533083248  1.533083e+09
2   1533083248  1.533083e+09
3   1533083248  1.533083e+09
4   1533083248  1.533083e+09

用嵌套的dict键值对创建包含项的列表

问题描述

2 个解决方案

解决方案1
0 已采纳 2018-10-06 11:18:46

解决方案2
0 2018-10-06 11:28:09

用嵌套的dict键值对创建包含项的列表

问题描述

2 个解决方案

解决方案1 0 已采纳 2018-10-06 11:18:46

解决方案2 0 2018-10-06 11:28:09

解决方案1
0 已采纳 2018-10-06 11:18:46

解决方案2
0 2018-10-06 11:28:09