I want to arrange the data from a data frame into multiple dataframes or groups. The input data is
id channel path
15 direct a1
15 direct a2
15 direct a3
15 direct a4
213 paid b2
213 paid b1
2222 direct as25
2222 direct dw46
2222 direct 32q
3111 paid d32a
3111 paid 23ff
3111 paid www32
3111 paid 2d2
The desired output should be like
id channel p1 p2
213 paid b2 b2
id channel p1 p2 p3
2222 direct as25 dw46 dw46
id channel p1 p2 p3 p4
15 direct a1 a2 a3 a4
3111 paid d32a 23ff www32 2d2
Please tell the way i can achieve it. Thanks
I think you can first create helper column cols
by cumcount
and then pivot_table
. Then you need find length of notnull
columns (substract first 2) and groupby
by this length
. Last dropna
columns in each group:
df['cols'] = 'p' + (df.groupby('id')['id'].cumcount() + 1).astype(str)
df1 = df.pivot_table(index=['id', 'channel'],
columns='cols',
values='path',
aggfunc='first').reset_index().rename_axis(None, axis=1)
print df1
id channel p1 p2 p3 p4
0 15 direct a1 a2 a3 a4
1 213 paid b2 b1 None None
2 2222 direct as25 dw46 32q None
3 3111 paid d32a 23ff www32 2d2
print df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)
0 4
1 2
2 3
3 4
dtype: int64
for i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)):
print i
print g.dropna(axis=1)
2
id channel p1 p2
1 213 paid b2 b1
3
id channel p1 p2 p3
2 2222 direct as25 dw46 32q
4
id channel p1 p2 p3 p4
0 15 direct a1 a2 a3 a4
3 3111 paid d32a 23ff www32 2d2
For storing you can use dictionary
of DataFrames
:
dfs={i: g.dropna(axis=1)
for i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1))}
#select DataFrame with len=2
print dfs[2]
id channel p1 p2
1 213 paid b2 b1
#select DataFrame with len=3
print dfs[3]
id channel p1 p2 p3
2 2222 direct as25 dw46 32q
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.