簡體   English   中英

python在數據框中移動數據

[英]python shifting data in dataframes

對於以下數據幀,我想轉移數據。 按“ ID”和“系列”分組,應將Q,R和T列中的數據向下移動到“狀態”為“結束”的行。

    data = pd.DataFrame({
'ID': ['A','A','A','B','B','B','B','C','C','C','C','C','D','D'], 
'Series': [1,1,1,1,1,2,2,1,1,2,2,2,1,1],
'Status': ['Begin','Begin','End','Begin','End','Begin','End','Begin','End','Begin','Begin','End','Begin','End'],
'Q':[9,'','',30,'',14,'',3,'',17,'','',1,''],
'R': ['',8,'','','','','','','','',7,'','',''],
'T': ['','',12,'',38,'',21,'',6,'','',35,'',5]
})

結果應如下所示:

result = pd.DataFrame({
'ID': ['A','A','A','B','B','B','B','C','C','C','C','C','D','D'], 
'Series': [1,1,1,1,1,2,2,1,1,2,2,2,1,1],
'Status': ['Begin','Begin','End','Begin','End','Begin','End','Begin','End','Begin','Begin','End','Begin','End'],
'Q':['','',9,'',30,'',14,'',3,'','',17,'',1],
'R': ['','',8,'','','','','','','','',7,'',''],
'T': ['','',12,'',38,'',21,'',6,'','',35,'',5]
})

使用GroupBy.transform + GroupBy.first來查找第一個非NaN的值,然后通過maskduplicated刪除重復的值:

cols = ['Q', 'R', 'T']

#repalce emty strings to NaNs
data[cols] = data[cols].astype(str).replace('', np.nan)
print (data)
   ID  Series Status    Q    R    T
0   A       1  Begin    9  NaN  NaN
1   A       1  Begin  NaN    8  NaN
2   A       1    End  NaN  NaN   12
3   B       1  Begin   30  NaN  NaN
4   B       1    End  NaN  NaN   38
5   B       2  Begin   14  NaN  NaN
6   B       2    End  NaN  NaN   21
7   C       1  Begin    3  NaN  NaN
8   C       1    End  NaN  NaN    6
9   C       2  Begin   17  NaN  NaN
10  C       2  Begin  NaN    7  NaN
11  C       2    End  NaN  NaN   35
12  D       1  Begin    1  NaN  NaN
13  D       1    End  NaN  NaN    5

g = data.groupby(['ID', 'Series'])
for c in cols:
    data[c] = g[c].transform('first')
print (data)
   ID  Series Status   Q    R   T
0   A       1  Begin   9    8  12
1   A       1  Begin   9    8  12
2   A       1    End   9    8  12
3   B       1  Begin  30  NaN  38
4   B       1    End  30  NaN  38
5   B       2  Begin  14  NaN  21
6   B       2    End  14  NaN  21
7   C       1  Begin   3  NaN   6
8   C       1    End   3  NaN   6
9   C       2  Begin  17    7  35
10  C       2  Begin  17    7  35
11  C       2    End  17    7  35
12  D       1  Begin   1  NaN   5
13  D       1    End   1  NaN   5

data[cols] = data[cols].mask(data.duplicated(['ID','Series'], keep='last'), '').fillna('')
print (data)

   ID  Series Status   Q  R   T
0   A       1  Begin           
1   A       1  Begin           
2   A       1    End   9  8  12
3   B       1  Begin           
4   B       1    End  30     38
5   B       2  Begin           
6   B       2    End  14     21
7   C       1  Begin           
8   C       1    End   3      6
9   C       2  Begin           
10  C       2  Begin           
11  C       2    End  17  7  35
12  D       1  Begin           
13  D       1    End   1      5

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM