简体   繁体   中英

Slicing and arranging dataframe in pandas

I want to arrange the data from a data frame into multiple dataframes or groups. The input data is

id  channel path
15  direct  a1
15  direct  a2
15  direct  a3
15  direct  a4
213 paid    b2
213 paid    b1
2222    direct  as25
2222    direct  dw46
2222    direct  32q
3111    paid    d32a
3111    paid    23ff
3111    paid    www32
3111    paid    2d2

The desired output should be like

id  channel p1  p2      
213 paid    b2  b2      

id  channel p1  p2  p3  
2222    direct  as25    dw46    dw46    

id  channel p1  p2  p3  p4
15  direct  a1  a2  a3  a4
3111    paid    d32a    23ff    www32   2d2

Please tell the way i can achieve it. Thanks

I think you can first create helper column cols by cumcount and then pivot_table . Then you need find length of notnull columns (substract first 2) and groupby by this length . Last dropna columns in each group:

df['cols'] = 'p' + (df.groupby('id')['id'].cumcount() + 1).astype(str)

df1 = df.pivot_table(index=['id', 'channel'], 
                    columns='cols', 
                    values='path', 
                    aggfunc='first').reset_index().rename_axis(None, axis=1)

print df1
     id channel    p1    p2     p3    p4
0    15  direct    a1    a2     a3    a4
1   213    paid    b2    b1   None  None
2  2222  direct  as25  dw46    32q  None
3  3111    paid  d32a  23ff  www32   2d2

print df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)
0    4
1    2
2    3
3    4
dtype: int64

for i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)):
    print i
    print g.dropna(axis=1)
2
    id channel  p1  p2
1  213    paid  b2  b1
3
     id channel    p1    p2   p3
2  2222  direct  as25  dw46  32q
4
     id channel    p1    p2     p3   p4
0    15  direct    a1    a2     a3   a4
3  3111    paid  d32a  23ff  www32  2d2

For storing you can use dictionary of DataFrames :

dfs={i: g.dropna(axis=1)         
    for i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1))}

#select DataFrame with len=2    
print dfs[2]
    id channel  p1  p2
1  213    paid  b2  b1

#select DataFrame with len=3       
print dfs[3]
     id channel    p1    p2   p3
2  2222  direct  as25  dw46  32q

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM