[英]Create a list with items from a nested dict key value pair
I would like to create a new list with items from a large nested dict. 我想创建一个新列表,其中包含来自大型嵌套字典的项目。
Here is a snippet of the nested dict: 这是嵌套字典的一个片段:
AcceptedAnswersPython_combined.json AcceptedAnswersPython_combined.json
{
"items": [
{
"answers": [
{
"creation_date": 1533083368,
"is_accepted": false
},
{
"creation_date": 1533083567,
"is_accepted": false
},
{
"creation_date": 1533083754,
"is_accepted": true
},
{
"creation_date": 1533084669,
"is_accepted": false
},
{
"creation_date": 1533089107,
"is_accepted": false
}
],
"creation_date": 1533083248,
"tags": [
"python",
"pandas",
"dataframe"
]
},
{
"answers": [
{
"creation_date": 1533084137,
"is_accepted": true
}
],
"creation_date": 1533083367,
"tags": [
"python",
"binary-search-tree"
]
}
]
}
The new list should contain the creation_date
of each item as many times as there are dicts inside the answers
list. 新列表应包含每个项目的creation_date
,次数应与answers
列表中的字典次数相同。 So in case of the code snippet above the new list should look like this: 因此,如果新列表上方的代码段看起来像这样:
question_date_per_answer = [[1533083248, 1533083248, 1533083248 , 1533083248, 1533083248], [1533083367]]
The reason why I need this new list is that I would like to determine the difference between each answers
creation_date
and its associated question creation_date
(stated inside the each items
dict). 我需要这个新列表的原因是,我想确定每个answers
creation_date
及其关联的问题creation_date
(在每个items
dict中表示)之间的差异。
This new list should look like this in pandas Dataframe: 这个新列表在pandas Dataframe中看起来应该像这样:
question creation date answer creation date
0 1533083248 1533083368
1 1533083248 1533083567
2 1533083248 1533083754
3 1533083248 1533084669
4 1533083248 1533089107
5 1533083367 1533084137
I can iterate through all question like so: 我可以像这样遍历所有问题:
items = json.load(open('AcceptedAnswersPython_combined.json'))['items']
question_creation_date = [item['creation_date'] for item in items]
But this leaves me with a list which is unequal to the number of answers
creation_date
. 但是,这给我留下的清单与answers
creation_date
的数量不相等。
I can't get my head around this. 我无法解决这个问题。
So how do I create such a list where the amount of question creation dates is equal to the amount of answer creation dates? 那么,如何创建这样一个列表,其中问题创建日期的数量等于答案创建日期的数量? (like question_date_per_answer
) (如question_date_per_answer
)
Thanks in advance. 提前致谢。
you need to iterate over item["answers"] and then get creation_date for each answer in oreder to get answer creation dates. 您需要遍历item [“ answers”],然后为oreder中的每个答案获取creation_date以获取答案创建日期。
my_json = """{
"items": [
{
"answers": [
{
"creation_date": 1533083368,
"is_accepted": false
},
{
"creation_date": 1533083567,
"is_accepted": false
},
{
"creation_date": 1533083754,
"is_accepted": true
},
{
"creation_date": 1533084669,
"is_accepted": false
},
{
"creation_date": 1533089107,
"is_accepted": false
}
],
"creation_date": 1533083248,
"tags": [
"python",
"pandas",
"dataframe"
]
},
{
"answers": [
{
"creation_date": 1533084137,
"is_accepted": true
}
],
"creation_date": 1533083367,
"tags": [
"python",
"binary-search-tree"
]
}
]
}"""
import json
data = json.loads(my_json)
dates = [(question["creation_date"], answer["creation_date"])
for question in data["items"] for answer in question["answers"]]
print(dates)
You can still work with the list at hand. 您仍然可以使用列表。
Lets try making a dataframe from the list that you already have- 让我们尝试从您已经拥有的列表中制作一个数据框-
l = [[1533083248, 1533083248, 1533083248 , 1533083248, 1533083248], [1533083367]]
df = pd.DataFrame(l)
Unfortunately you get the following- 不幸的是,您得到以下信息-
0 1 2 3 4
0 1533083248 1.533083e+09 1.533083e+09 1.533083e+09 1.533083e+09
1 1533083367 NaN NaN NaN NaN
So we need to transpose it. 所以我们需要转置它。 For that lets do the following - 为此,请执行以下操作-
from itertools import zip_longest
k = list(list(zip_longest(*l))) #Unless the list will be truncated to the length of shortest list.
df = pd.DataFrame(k)
Output- 输出 -
0 1
0 1533083248 1.533083e+09
1 1533083248 NaN
2 1533083248 NaN
3 1533083248 NaN
4 1533083248 NaN
Now we will forward fill the NaNs with the previous value by - df.fillna(method='ffill')
现在,我们将通过df.fillna(method='ffill')
用先前的值来填充NaN。
Whole snippet - 整个代码段-
from itertools import zip_longest
l=[1533083248, 1533083248, 1533083248 , 1533083248, 1533083248], [1533083367]
k=list(list(zip_longest(*l)))
df = pd.DataFrame(k)
df.fillna(method='ffill')
Voila - 瞧-
0 1
0 1533083248 1.533083e+09
1 1533083248 1.533083e+09
2 1533083248 1.533083e+09
3 1533083248 1.533083e+09
4 1533083248 1.533083e+09
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.