簡體   English   中英

基於 groupby 后條件的 Pandas 新列

[英]Pandas new column based on condition after groupby

我有一個數據集,其中基於兩列進行分組:代碼和組。 樣本數據可以生成如下:

import pandas as pd
# Sample dataframe
df = pd.DataFrame({'code': [12] * 5 + [20] * 5,
                  'group': ['A', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'B'],
                  'options': ['x,y', 'x', 'x', 'y', 'y', 'z', 'z', 'x', 'y', 'z']})
print(df)

   code group options
0    12     A     x,y
1    12     A       x
2    12     A       x
3    12     B       y
4    12     B       y
5    20     A       z
6    20     A       z
7    20     B       x
8    20     B       y
9    20     B       z

我要做的第一件事是生成一個新列,其中包含每個組的所有可能選項。 我無法一步完成,但這是我所做的:

# First generate a new column joining all the options by group in temporary strings
df['group_options'] = df.groupby(['code','group'])['options'].transform(lambda x: ','.join(x))
# Transform these temporary strings into lists containing unique values
df['group_options'] = df['group_options'].map(lambda x: list(set([option for temp_str in x.split(',') for option in temp_str])))

結果:

   code group options group_options
0    12     A     x,y        [x, y]
1    12     A       x        [x, y]
2    12     A       x        [x, y]
3    12     B       y           [y]
4    12     B       y           [y]
5    20     A       z           [z]
6    20     A       z           [z]
7    20     B       x     [x, z, y]
8    20     B       y     [x, z, y]
9    20     B       z     [x, z, y]

現在我想生成兩個新列以供以后使用, group_a_optionsgroup_b_options ,這些列應該包含每個codegroup_options中的數據:

   code group options group_options group_a_options group_b_options
0    12     A     x,y        [x, y]          [x, y]             [y]
1    12     A       x        [x, y]          [x, y]             [y]
2    12     A       x        [x, y]          [x, y]             [y]
3    12     B       y           [y]          [x, y]             [y]
4    12     B       y           [y]          [x, y]             [y]
5    20     A       z           [z]             [z]       [x, y, z]
6    20     A       z           [z]             [z]       [x, y, z]
7    20     B       x     [x, z, y]             [z]       [x, y, z]
8    20     B       y     [x, z, y]             [z]       [x, y, z]
9    20     B       z     [x, z, y]             [z]       [x, y, z]

我一直在嘗試使用groupby生成這個新列並進行transform ,但沒有成功。 如何將列group的條件添加到groupby以獲得所需的輸出? 任何幫助表示贊賞。

首先是通過連接值來創建帶有set s 的Series ,然后拆分,最后轉換為list s:

s = df.groupby(['code','group'])['options'].agg(lambda x: list(set(','.join(x).split(','))))

然后通過Series.unstack重塑並更改列名稱:

df1 = s.unstack().add_prefix('group_').add_suffix('_options').rename(columns=str.lower)

最后使用DataFrame.join兩列,然后列code

df = df.join(s.rename('group_options'), on=['code','group']).join(df1, on='code')
print(df)
   code group options group_options group_a_options group_b_options
0    12     A     x,y        [y, x]          [y, x]             [y]
1    12     A       x        [y, x]          [y, x]             [y]
2    12     A       x        [y, x]          [y, x]             [y]
3    12     B       y           [y]          [y, x]             [y]
4    12     B       y           [y]          [y, x]             [y]
5    20     A       z           [z]             [z]       [y, x, z]
6    20     A       z           [z]             [z]       [y, x, z]
7    20     B       x     [y, x, z]             [z]       [y, x, z]
8    20     B       y     [y, x, z]             [z]       [y, x, z]
9    20     B       z     [y, x, z]             [z]       [y, x, z]

如果排序很重要,則通過dict.fromkeys技巧刪除重復值:

s = (df.groupby(['code','group'])['options']
       .agg(lambda x: list(dict.fromkeys(','.join(x).split(',')))))

df1 = s.unstack().add_prefix('group_').add_suffix('_options').rename(columns=str.lower)

df = df = df.join(s.rename('group_options'), on=['code','group']).join(df1, on='code')
print(df)
   code group options group_options group_a_options group_b_options
0    12     A     x,y        [x, y]          [x, y]             [y]
1    12     A       x        [x, y]          [x, y]             [y]
2    12     A       x        [x, y]          [x, y]             [y]
3    12     B       y           [y]          [x, y]             [y]
4    12     B       y           [y]          [x, y]             [y]
5    20     A       z           [z]             [z]       [x, y, z]
6    20     A       z           [z]             [z]       [x, y, z]
7    20     B       x     [x, y, z]             [z]       [x, y, z]
8    20     B       y     [x, y, z]             [z]       [x, y, z]
9    20     B       z     [x, y, z]             [z]       [x, y, z]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM