简体   繁体   中英

Create a new column based on other columns and a dictionary

let's suppose I have a dataframe with at least two columns col1 and col2. Also I have a dictionary of dictionaries, whose keys consist of the values in col1 resp. col2.

import pandas as pd
dict_of_dicts = {'x0': {'y0':1, 'y1':2, 'y2':3}, 'x1': {'y0':0, 'y1':0,    'y2':1}, 'x2': {'y0':2, 'y1':1, 'y2':3}} 
df = pd.DataFrame( {'col1': ['x1', 'x2', 'x2'], 'col2': ['y0', 'y1', 'y0']} )
print(df)
  col1 col2
0   x1   y0
1   x2   y1
2   x2   y0

Now I want to create a third column that contains the value of my dictionary with the keys given by col1 and col2 in the respective line. Something like

df['col3'] = dict_of_dicts[df['col1']][df['col2']].

The result should look like this:

  col1 col2  col3
0   x1   y0     0
1   x2   y1     1
2   x2   y0     2

It should be similar to "map", as explained here Adding a new pandas column with mapped value from a dictionary

But I rely on two columns. Could anybody help me with that, please?

By the way: I actually don't have to use a dictionary of dictionaries (as just described). I could also use a table (dataframe) with the one set of keys as index set of the dataframe and the other set of keys as the column names. But also here I don't know how to access a specific "cell" which would be specified by the values in col1 and col2.

I hope my problem is clear.

Thank you, Nadja

我认为带有匿名函数的简单pandas.DataFrame.apply应该可以正常工作:

df.apply(lambda x: dict_of_dicts[x.col1][x.col2], axis=1)

Are you sure your desired output is correct? x1 - y0 is a 0 in your table. If so, this will work and make use only of internal functions of pandas, in hope that they are well optimized:

df2 = pd.DataFrame(dict_of_dicts)
df2 = df2.unstack().reset_index()
df.merge(df2, left_on=['col1', 'col2'], right_on=['level_0', 'level_1'], how='left')

Which will result in:

  col1 col2 level_0 level_1  0
0   x1   y0      x1      y0  0
1   x2   y1      x2      y1  1
2   x2   y0      x2      y0  2

It should be possible to use list comprehension in the following manner:

df['col3'] = [dict_of_dicts[x][y] for x, y in zip(df['col1'], df['col2'])]
print(df)

  col1 col2  col3
0   x1   y0     0
1   x2   y1     1
2   x2   y0     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM