简体   繁体   中英

Vectorized method to fill dataframe column from indices from another Multiindexed dataframe?

Say I have a multiindexed dataframe df1 :

        x   y
i0  i1      
aaa a   1   6
    b   2   5
    c   3   4
bbb x   4   3
    y   5   2
    z   6   1

with a second dataframe df2 :

       foo         bar      cat
0   0.925481    0.682102    aaa
1   0.620557    0.375114    aaa
2   0.788425    0.746436    bbb
3   0.247594    0.902258    aaa
4   0.146786    0.048585    bbb
5   0.256904    0.327165    bbb

and I want to use a list of second level indices of the first dataframe to fill a column in the second dataframe df2 as an array like this:

       foo         bar      cat      catlist
0   0.925481    0.682102    aaa   ['a','b','c']
1   0.620557    0.375114    aaa   ['a','b','c']
2   0.788425    0.746436    bbb   ['x','y','z']
3   0.247594    0.902258    aaa   ['a','b','c']
4   0.146786    0.048585    bbb   ['x','y','z']
5   0.256904    0.327165    bbb   ['x','y','z']

I can do this (slowly) by iterating, but how should this be done using a vectorized pandas method?

Iterating using apply :

def iterate(row):
    return list(df1.loc[row['cat']].index.get_level_values(0))

df2['catlist'] = df2.apply(iterate, axis=1)

You can use convert your MultiIndex to dataframe using pd.MultiIndex.to_frame , then gropuby and use GroupBy.agg to list. Now that mapping in pd.Series.map

mapping = df1.index.to_frame(index=False).groupby("i0")["i1"].agg(list)
df2["catlist"] = df2["cat"].map(mapping)

        foo       bar  cat    catlist
0  0.925481  0.682102  aaa  [a, b, c]
1  0.620557  0.375114  aaa  [a, b, c]
2  0.788425  0.746436  bbb  [x, y, z]
3  0.247594  0.902258  aaa  [a, b, c]
4  0.146786  0.048585  bbb  [x, y, z]
5  0.256904  0.327165  bbb  [x, y, z]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM