熊猫-Groupby + Shift无法正常工作

Question

我有我试图执行一个DF groupby和shift上。 但是，输出不是我想要的。

我想将“下一个” DueDate移到以前的日期。 因此，如果当前DueDate为1/1，下一个DueDate为6/30，则对于NextDueDate DueDate==1/1所有行，插入一个新列，其中NextDueDate为6/30。 然后，当当前DueDate为6/30时，然后为DueDate==6/30 6/30的所有行插入下一个DueDate 。

Original df
ID Document Date  DueDate
1  ABC      1/31  1/1  
1  ABC      2/28  1/1  
1  ABC      3/31  1/1  
1  ABC      4/30  6/30 
1  ABC      5/31  6/30 
1  ABC      6/30  7/31 
1  ABC      7/31  7/31 
1  ABC      8/31  9/30

Desired output df
ID Document Date  DueDate NextDueDate
1  ABC      1/31  1/1     6/30
1  ABC      2/28  1/1     6/30
1  ABC      3/31  1/1     6/30
1  ABC      4/30  6/30    7/31
1  ABC      5/31  6/30    7/31
1  ABC      6/30  7/31    9/30
1  ABC      7/31  7/31    9/30
1  ABC      8/31  9/30    10/31

我在df['NextDueDate'] = df.groupby(['ID','Document'])['DueDate'].shift(-1)但它并不df['NextDueDate'] = df.groupby(['ID','Document'])['DueDate'].shift(-1)我真正了解我想要。

Answer 1

联合会

s=df.groupby('DueDate',as_index=False).size().to_frame('number').reset_index()
s.DueDate=s.DueDate.shift(-1).fillna('10/31')
s
Out[251]: 
  DueDate  number
0    6/30       3
1    7/31       2
2    9/30       2
3   10/31       1
s.DueDate.repeat(s.number)
Out[252]: 
0     6/30
0     6/30
0     6/30
1     7/31
1     7/31
2     9/30
2     9/30
3    10/31
Name: DueDate, dtype: object
df['Nextduedate']=s.DueDate.repeat(s.number).values
df
Out[254]: 
   ID Document  Date DueDate Nextduedate
0   1      ABC  1/31     1/1        6/30
1   1      ABC  2/28     1/1        6/30
2   1      ABC  3/31     1/1        6/30
3   1      ABC  4/30    6/30        7/31
4   1      ABC  5/31    6/30        7/31
5   1      ABC  6/30    7/31        9/30
6   1      ABC  7/31    7/31        9/30
7   1      ABC  8/31    9/30       10/31

如果您有多个组：

l=[]
for _, df1 in df.groupby(["ID", "Document"]):
    s = df1.groupby('DueDate', as_index=False).size().to_frame('number').reset_index()
    s.DueDate = s.DueDate.shift(-1).fillna('10/31')
    df1['Nextduedate'] = s.DueDate.repeat(s.number).values
    l.append(df1)



New_df=pd.concat(l)

Answer 2

定义函数f以根据更改后的日期执行替换-

def f(x):
     i = x.drop_duplicates()
     j = i.shift(-1).fillna('10/30')

     return x.map(dict(zip(i, j)))

现在，调用这个函数里面groupby + apply的ID和Document -

df['NextDueDate'] = df.groupby(['ID', 'Document']).DueDate.apply(f)
df

   ID Document  Date DueDate NextDueDate
0   1      ABC  1/31     1/1        6/30
1   1      ABC  2/28     1/1        6/30
2   1      ABC  3/31     1/1        6/30
3   1      ABC  4/30    6/30        7/31
4   1      ABC  5/31    6/30        7/31
5   1      ABC  6/30    7/31        9/30
6   1      ABC  7/31    7/31        9/30
7   1      ABC  8/31    9/30       10/30

熊猫-Groupby + Shift无法正常工作

问题描述

2 个解决方案

解决方案1
2 2018-01-03 17:45:13

解决方案2
2 已采纳 2018-01-03 19:04:55

熊猫-Groupby + Shift无法正常工作

问题描述

2 个解决方案

解决方案1 2 2018-01-03 17:45:13

解决方案2 2 已采纳 2018-01-03 19:04:55

解决方案1
2 2018-01-03 17:45:13

解决方案2
2 已采纳 2018-01-03 19:04:55