简体   繁体   English

熊猫-从字典列表创建df

[英]Pandas - Create df from list of dicts

I have data in the following format (list of dicts that each contain a list of 3 lists): 我有以下格式的数据(每个字典包含3个列表的列表):

[{40258: [['2018-07-03T14:13:41'], ['Open'], ['Closed']]},
 {40257: [['2018-07-03T13:47:55',
    '2018-07-03T14:21:52',
    '2018-07-04T11:56:44'],
   ['Open', 'In Progress', 'Waiting on 3rd Party'],
   ['In Progress', 'Waiting on 3rd Party', 'In Progress']]},
 {40255: [['2018-07-03T13:12:58'], ['Open'], ['Closed']]},
 {40250: [[], [], []]}]

I would like the above to be converted to the following df: 我希望将以上内容转换为以下df:

key    List1-1              List1-2            List1-3               List2-1     List2-2          List2-3                 List3-1         List3-2                   List3-3
40258  2018-07-03T14:13:41  nan                nan                   'Open'      nan              nan                     'Closed'        nan                       nan
40257  2018-07-03T13:47:55 2018-07-03T14:21:52 2018-07-04T11:56:44   'Open'     'In Progress'    'Waiting on 3rd Party'   'In Progress'   'Waiting on 3rd Party'   'In Progress'
40255  2018-07-03T13:12:58  nan                nan                   'Open'      nan              nan                     'Closed'        nan                       nan
40250  nan                  nan                nan                    nan        nan              nan                      nan            nan                       nan
  • Each key is a row and each element of the list is a column. 每个键是一行,列表的每个元素是一列。
  • The outer list contains 50,000 dict's to be made into rows. 外部列表包含要排成行的50,000格。
  • There are always exactly 3 inner lists. 总有3个内部列表。
  • The inner lists are of variable length - ranging from 0 up to a maximum of 25. 内部列表的长度可变-从0到最大25。

I have tried a plain pd.DataFrame and pd.DataFrame.from_dict but I can't find solutions that deal with multiple lists inside the dict. 我尝试了一个普通的pd.DataFramepd.DataFrame.from_dict但是我找不到在dict中处理多个列表的解决方案。

Any help is much appreciated. 任何帮助深表感谢。

data=[{40258: [['2018-07-03T14:13:41'], ['Open'], ['Closed']]},
 {40257: [['2018-07-03T13:47:55',
     '2018-07-03T14:21:52',
     '2018-07-04T11:56:44'],
    ['Open', 'In Progress', 'Waiting on 3rd Party'],
    ['In Progress', 'Waiting on 3rd Party', 'In Progress']]},
  {40255: [['2018-07-03T13:12:58'], ['Open'], ['Closed']]},
  {40250: [[], [], []]}]

f = lambda x: x + [np.nan]*(3-len(x))
mod_data = [ [k]+ sum(list(map(f, v)), []) for d in data for k,v in d.items()]

cols = ['key', 'List1-1', 'List1-2', 'List1-3', 'List2-1', 'List2-2', 'List2-3', 'List3-1', 'List3-2', 'List3-3']
df = pd.DataFrame(mod_data, columns=cols).set_index('key')
print(df)

Output 输出量

                   List1-1              List1-2              List1-3 List2-1      List2-2               List2-3      List3-1               List3-2      List3-3
key                                                                                                                                                            
40258  2018-07-03T14:13:41                  NaN                  NaN    Open          NaN                   NaN       Closed                   NaN          NaN
40257  2018-07-03T13:47:55  2018-07-03T14:21:52  2018-07-04T11:56:44    Open  In Progress  Waiting on 3rd Party  In Progress  Waiting on 3rd Party  In Progress
40255  2018-07-03T13:12:58                  NaN                  NaN    Open          NaN                   NaN       Closed                   NaN          NaN
40250                  NaN                  NaN                  NaN     NaN          NaN                   NaN          NaN                   NaN          NaN

Creating a list of lists and then create a df using pd.dataFrame(data,columns) is what seems the easiest option. 创建列表列表,然后使用pd.dataFrame(data,columns)创建df似乎是最简单的选择。

# First calculate the length of maximum list in the dictionary
# Let that be lmax
data = []
for elem in dict :
    for key in elem :  # Note that only one key is there
        lst = elem[key] # z is the list
        data_curr = [np.nan] * (3*len(lmax) + 1)
        data_curr[0] = elem
        l = len(lst[0])
        for i in range(0,l) :
             data_curr[3*i+1] = z[0][i]
             data_curr[3*i+2] = z[1][i]
             data_curr[3*i+3] = z[2][i]
        data.append(data_curr]

columns = ['key','List1-1,List1-2','List1-3','List2-1','List2-2','List2-3','List3-1','List3-2','List3-3']
df = pd.DataFrame(data,columns=columns)

I hope you get the idea 我希望你明白

Figured I share my solution anyways: 想通了我仍然分享我的解决方案:

from numpy import nan
mess = [{40258: [['2018-07-03T14:13:41'], ['Open'], ['Closed']]},
 {40257: [['2018-07-03T13:47:55',
    '2018-07-03T14:21:52',
    '2018-07-04T11:56:44'],
   ['Open', 'In Progress', 'Waiting on 3rd Party'],
   ['In Progress', 'Waiting on 3rd Party', 'In Progress']]},
 {40255: [['2018-07-03T13:12:58'], ['Open'], ['Closed']]},
 {40250: [[], [], []]}]

master = dict()
for dicto in mess:
    key = list(dicto.keys())[0]
    master[key] = {('List{}-{}'.format(j+1,i+1)): (dicto[key][j][i] if i < len(dicto[key][j]) else nan ) for i in range(3) for j in range(3)}
output = pd.DataFrame.from_records(master, columns=list(master.keys())).T
print(output.to_string())

Output: 输出:

                   List1-1              List1-2              List1-3 List2-1      List2-2               List2-3      List3-1               List3-2      List3-3
40258  2018-07-03T14:13:41                  NaN                  NaN    Open          NaN                   NaN       Closed                   NaN          NaN
40257  2018-07-03T13:47:55  2018-07-03T14:21:52  2018-07-04T11:56:44    Open  In Progress  Waiting on 3rd Party  In Progress  Waiting on 3rd Party  In Progress
40255  2018-07-03T13:12:58                  NaN                  NaN    Open          NaN                   NaN       Closed                   NaN          NaN
40250                  NaN                  NaN                  NaN     NaN          NaN                   NaN          NaN                   NaN          NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM