简体   繁体   中英

Convert a Python DataFrame into a list of dictionaries

I have a dataframe and want to convert it to a list of dictionaries. I use read_csv() to create this dataframe. The dataframe looks like the following:

             AccountName AccountType  StockName Allocation
          0  MN001       #1           ABC       0.4
          1  MN001       #1           ABD       0.6
          2  MN002       #2           EFG       0.5
          3  MN002       #2           HIJ       0.4
          4  MN002       #2           LMN       0.1 

The desired output:

   [{'ABC':0.4, 'ABD':0.6}, {'EFG':0.5, 'HIJ':0.4,'LMN':0.1}]

I have tried to research on similar topics and used the Dataframe.to_dict() function. I look forward to getting this done. Many thanks for your help!

This should do it:

portfolios = []
for _, account in df.groupby('AccountName'):
    portfolio = {stock['StockName']: stock['Allocation'] 
              for _, stock in account.iterrows()}
    portfolios.append(portfolio)

First use the groupby() function to group the rows of the dataframe by AccountName . To access the individual rows (stocks) for each account, you use the iterrows() method. As user @ebb-earl-co explained in the comments, the _ is there as a placeholder variable, because iterrows() returns (index, Series ) tuples, and we only need the Series (the rows themselves). From there, use a dict comprehension to create a dictionary mapping StockName -> Allocation for each stock. Finally, append that dictionary to the list of portfolios , resulting in the expected output:

[{'ABC': 0.4, 'ABD': 0.6}, {'EFG': 0.5, 'HIJ': 0.4, 'LMN': 0.1}]

One more thing: if you decide later that you want to label each dict in the portfolios with the account name, you could do it like this:

portfolios = []
for acct_name, account in df.groupby('AccountName'):
    portfolio = {stock['StockName']: stock['Allocation'] 
              for _, stock in account.iterrows()}
    portfolios.append({acct_name: portfolio})

This will return a list of nested dicts like this:

[{'MN001': {'ABC': 0.4, 'ABD': 0.6}},
 {'MN002': {'EFG': 0.5, 'HIJ': 0.4, 'LMN': 0.1}}]

Note that in this case, I used the variable acct_name instead of assigning to _ because we actually will use the index to "label" the dicts in the portfolios list.

import pandas as pd 
import numpy as np 

d = np.array([['MN001','#1','ABC', 0.4],
                ['MN001','#1','ABD', 0.6],
                ['MN002', '#2', 'EFG', 0.5],
                ['MN002', '#2', 'HIJ', 0.4],
                ['MN002', '#2', 'LMN', 0.1]])  
df = pd.DataFrame(data=d, columns = ['AccountName','AccountType','StockName', 'Allocation'])

by_account_df = df.groupby('AccountName').apply(lambda x : dict(zip(x['StockName'],x['Allocation']))).reset_index(name='dic'))
by_account_lst = by_account_df['dic'].values.tolist()

And the result should be:

print(by_account_lst)
[{'ABC': '0.4', 'ABD': '0.6'}, {'EFG': '0.5', 'HIJ': '0.4', 'LMN': '0.1'}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM