简体   繁体   中英

python shifting data in dataframes

With the following data frame, I would like to shift data. Grouping by 'ID' and 'Series', the data in columns Q, R and T should be shifted down to the row where 'Status' is End.

    data = pd.DataFrame({
'ID': ['A','A','A','B','B','B','B','C','C','C','C','C','D','D'], 
'Series': [1,1,1,1,1,2,2,1,1,2,2,2,1,1],
'Status': ['Begin','Begin','End','Begin','End','Begin','End','Begin','End','Begin','Begin','End','Begin','End'],
'Q':[9,'','',30,'',14,'',3,'',17,'','',1,''],
'R': ['',8,'','','','','','','','',7,'','',''],
'T': ['','',12,'',38,'',21,'',6,'','',35,'',5]
})

The result should be as follows:

result = pd.DataFrame({
'ID': ['A','A','A','B','B','B','B','C','C','C','C','C','D','D'], 
'Series': [1,1,1,1,1,2,2,1,1,2,2,2,1,1],
'Status': ['Begin','Begin','End','Begin','End','Begin','End','Begin','End','Begin','Begin','End','Begin','End'],
'Q':['','',9,'',30,'',14,'',3,'','',17,'',1],
'R': ['','',8,'','','','','','','','',7,'',''],
'T': ['','',12,'',38,'',21,'',6,'','',35,'',5]
})

Use GroupBy.transform + GroupBy.first for find first non NaN s value and then remove duplicated values by mask and duplicated :

cols = ['Q', 'R', 'T']

#repalce emty strings to NaNs
data[cols] = data[cols].astype(str).replace('', np.nan)
print (data)
   ID  Series Status    Q    R    T
0   A       1  Begin    9  NaN  NaN
1   A       1  Begin  NaN    8  NaN
2   A       1    End  NaN  NaN   12
3   B       1  Begin   30  NaN  NaN
4   B       1    End  NaN  NaN   38
5   B       2  Begin   14  NaN  NaN
6   B       2    End  NaN  NaN   21
7   C       1  Begin    3  NaN  NaN
8   C       1    End  NaN  NaN    6
9   C       2  Begin   17  NaN  NaN
10  C       2  Begin  NaN    7  NaN
11  C       2    End  NaN  NaN   35
12  D       1  Begin    1  NaN  NaN
13  D       1    End  NaN  NaN    5

g = data.groupby(['ID', 'Series'])
for c in cols:
    data[c] = g[c].transform('first')
print (data)
   ID  Series Status   Q    R   T
0   A       1  Begin   9    8  12
1   A       1  Begin   9    8  12
2   A       1    End   9    8  12
3   B       1  Begin  30  NaN  38
4   B       1    End  30  NaN  38
5   B       2  Begin  14  NaN  21
6   B       2    End  14  NaN  21
7   C       1  Begin   3  NaN   6
8   C       1    End   3  NaN   6
9   C       2  Begin  17    7  35
10  C       2  Begin  17    7  35
11  C       2    End  17    7  35
12  D       1  Begin   1  NaN   5
13  D       1    End   1  NaN   5

data[cols] = data[cols].mask(data.duplicated(['ID','Series'], keep='last'), '').fillna('')
print (data)

   ID  Series Status   Q  R   T
0   A       1  Begin           
1   A       1  Begin           
2   A       1    End   9  8  12
3   B       1  Begin           
4   B       1    End  30     38
5   B       2  Begin           
6   B       2    End  14     21
7   C       1  Begin           
8   C       1    End   3      6
9   C       2  Begin           
10  C       2  Begin           
11  C       2    End  17  7  35
12  D       1  Begin           
13  D       1    End   1      5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM