简体   繁体   English

Pandas Dataframe - 用于分隔列的字典列表

[英]Pandas Dataframe - list of dicts to seperate columns

I have a dataframe, df, which has a column which is a list of dictionaries:我有一个 dataframe,df,其中有一列是字典列表:

index action
0     [{'action_type': 'landing_page_view', 'value': '1'}, {'action_type': 'link_click', 'value': '1'}{'action_type': 'page_engagement', 'value': '1'}, {'action_type': 'post_engagement', 'value': '1'}]
1     [{'action_type': 'landing_page_view', 'value': '1'}, {'action_type': 'link_click', 'value': '1'}, {'action_type': 'page_engagement', 'value': '1'}, {'action_type': 'post_engagement', 'value': '1'}]
2     [{'action_type': 'video_view', 'value': '23'}, {'action_type': 'page_engagement', 'value': '23'}, {'action_type': 'post_engagement', 'value': '23'}]

I want to be able to extract the value from each dictionary and attribute it to its own column eg我希望能够从每个字典中提取值并将其归因于它自己的列,例如

index action_landing_page_view action_link_click action_page_engagement action_post_engagement action_video_view
0     1                        1                 1                      1                      0
1     1                        1                 1                      1                      0
2     0                        0                 23                     23                     23

I have tried df.apply(pd.Series) which splits out the dicts into seperate columns but with no column headers.我试过df.apply(pd.Series)dicts分成单独的列,但没有列标题。

The dictionaries that are in my original dataframe do not follow the same order.我原来的 dataframe 中的字典不遵循相同的顺序。 eg the first dict for in row 1 starts with action_type "landing_page_view" whereas row 3 starts with "video_view".例如,第 1 行的第一个 dict 以 action_type“landing_page_view”开头,而第 3 行以“video_view”开头。

Is it possible to attribute values to different columns based on the action_type in the dictionary?是否可以根据字典中的 action_type 将值归因于不同的列?

You should first extract key-values for column name and column values, then constructing a dataframe based on them.您应该首先提取列名和列值的键值,然后基于它们构造一个 dataframe。 (I think from the very first place it does not make sense to have such dictionaries. It makes sense to have them like this: {'landing_page_view': 1} ) (我从一开始就认为拥有这样的字典是没有意义的。这样拥有它们是有意义的: {'landing_page_view': 1}

index_action_extracted = [{} for i in range(len(index_action))]

for i in range(len(index_action)):
    list_ = index_action[i]
    for dict_ in list_:
        column_name = dict_['action_type']
        column_value = dict_['value']
        index_action_extracted[i].update({column_name: column_value})

df = pd.DataFrame(index_action_extracted).fillna(0)

In index_action_extracted we are removing redundant keys 'action_type' and 'value' and build a dictionary without those keys.index_action_extracted中,我们删除了冗余键'action_type''value'并构建了一个没有这些键的字典。

We can solve this using list comprehensions:我们可以使用列表推导来解决这个问题:

pd.DataFrame(
    [{e['action_type']: e['value']
      for e in l}
     for l in list(df['action'])]
).fillna(0)

Notice that I reconstructed your dataframe as follow:请注意,我重建了您的 dataframe 如下:

import pandas as pd

df = pd.DataFrame([[
    [
        {'action_type': 'landing_page_view', 'value': '1'},
        {'action_type': 'link_click', 'value': '1'},
        {'action_type': 'page_engagement', 'value': '1'},
        {'action_type': 'post_engagement', 'value': '1'},
    ],
    [
        {'action_type': 'landing_page_view', 'value': '1'},
        {'action_type': 'link_click', 'value': '1'},
        {'action_type': 'page_engagement', 'value': '1'},
        {'action_type': 'post_engagement', 'value': '1'},
    ],
    [
        {'action_type': 'video_view', 'value': '23'},
        {'action_type': 'page_engagement', 'value': '23'},
        {'action_type': 'post_engagement', 'value': '23'},
    ]]]
)
df = df.transpose()
df = df[0].rename('action')
df = pd.DataFrame(df)

I return a list of dictionary items and then I map the dictionary items into the dataframe then transpose the dataframe我返回一个字典项目列表,然后我将 map 字典项目放入 dataframe 然后转置 dataframe

 df = pd.DataFrame([
    [
    {'action_type': 'landing_page_view', 'value': '1'},
    {'action_type': 'link_click', 'value': '1'},
    {'action_type': 'page_engagement', 'value': '1'},
    {'action_type': 'post_engagement', 'value': '1'},
],
[
    {'action_type': 'landing_page_view', 'value': '1'},
    {'action_type': 'link_click', 'value': '1'},
    {'action_type': 'page_engagement', 'value': '1'},
    {'action_type': 'post_engagement', 'value': '1'},
],
[
    {'action_type': 'video_view', 'value': '23'},
    {'action_type': 'page_engagement', 'value': '23'},
    {'action_type': 'post_engagement', 'value': '23'},
]
])

df2=pd.DataFrame(columns=['action_type','value'])
for key,dictList in df.iterrows():
    for key, dictAction in dictList.items():  #returns tuple (index, dict)
    #print(dictAction)
        if (dictAction is None)==False:
        
            keys=dictAction.keys()
            index=len(df2)
            for key in keys:
               value=dictAction[key]
               df2.loc[index,key]=value

df2=df2.T
print(df2.head())   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM