简体   繁体   中英

How to create a new column in a dataframe using a multikey dictionary

I have a multikey dictionary that I would like to use to create a new column in a dataframe. Doing so with a single key dataframe is quite easy but I am stumped as to what the correct syntax is to send two values to the dictionary.

I have been able to use a single key dictionary using map, get, and apply (map example)

    import pandas as pd      
    df = pd.DataFrame(data = {'Col1': [1, 2, 3, 4], 'Col2': ['A', 'B', 'C', 'D']})

    single_dict = {1: 'This', 2: 'is', 3: 'pretty', 4: 'easy'}

    df['newcol_a'] = df['Col1'].map(single_dict)

    print(df)```

which returns the expected"

    Col1 Col2 newcol_a
    0     1    A     This
    1     2    B       is
    2     3    C   pretty
    3     4    D     easy

But when I create a multikey dictionary such as

dbl_dict = {1: {'A': 'THIS', 'B': 'blah', 'C': 'blah', 'D': 'blah'},
            2: {'A': 'blah', 'B': 'HAS' , 'C': 'blah', 'D': 'blah'},
            3: {'A': 'blah', 'B': 'blah', 'C': 'ME'  , 'D': 'blah'},
            4: {'A': 'blah', 'B': 'blah', 'C': 'blah', 'D': 'STUMPED'},}

I am able to call it using 'get'

dbl_dict.get(1, {}).get('A', 'Other')
Out[5]: 'THIS'      

But I cannot figure out the syntax (tried about 40 different things, such as df['newcol_b'] = df[['Col1', 'Col2']].map(dbl_dict) ) to get the desired results:

    Col1 Col2 newcol_a
    0     1    A     THIS
    1     2    B      HAS
    2     3    C       ME
    3     4    D  STUMPED

map does not know how to handle a nested dict. If you insist on using this dict you can use apply on the entire dataframe but you'd have to create a custom mapping function:

import pandas as pd

df = pd.DataFrame(data={'Col1': [1, 2, 3, 4], 'Col2': ['A', 'B', 'C', 'D']})
dbl_dict = {1: {'A': 'THIS', 'B': 'blah', 'C': 'blah', 'D': 'blah'},
            2: {'A': 'blah', 'B': 'HAS', 'C': 'blah', 'D': 'blah'},
            3: {'A': 'blah', 'B': 'blah', 'C': 'ME', 'D': 'blah'},
            4: {'A': 'blah', 'B': 'blah', 'C': 'blah', 'D': 'STUMPED'}}

df['new_col'] = df.apply(lambda s: dbl_dict.get(s['Col1'], {}).get(s['Col2']), axis=1)

df is now

   Col1 Col2  new_col
0     1    A     THIS
1     2    B      HAS
2     3    C       ME
3     4    D  STUMPED

A solution with loc (or at ) might be possible (and if so, will probably be faster). Need to look into that.

The easiest option you have, in my opinion, is to create a new DataFrame using your nested dictionary and unstack this DataFrame, which you can then join with your original DataFrame, like so:

s = pd.DataFrame(dbl_dict).unstack().rename_axis(('Col1','Col2')).rename('new_column')
print (s)
df = df.join(s, on=['Col1','Col2'])
print (df)

I've created a tiny (2 line) custom function for you to use which seems to solve the case. Of course this can be improved to catch some errors and behaviors for specific cases.

import pandas as pd
data = {'col_1':[1,2,3,4],'col_2':['A','B','C','D']}
df = pd.DataFrame(data)
dbl_dict = {1: {'A': 'THIS', 'B': 'blah', 'C': 'blah', 'D': 'blah'},
            2: {'A': 'blah', 'B': 'HAS' , 'C': 'blah', 'D': 'blah'},
            3: {'A': 'blah', 'B': 'blah', 'C': 'ME'  , 'D': 'blah'},
            4: {'A': 'blah', 'B': 'blah', 'C': 'blah', 'D': 'STUMPED'},}
def maperino(dict_name,key_1,key_2):
    val = [dict_name[key_1[i]][key_2[i]] for i in range(len(key_1))]
    return val        
df['col_3'] = maperino(dbl_dict,df['col_1'],df['col_2'])
print(df)

Output:

   col_1 col_2    col_3
0      1     A     THIS
1      2     B      HAS
2      3     C       ME
3      4     D  STUMPED

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM