[英]Data frame convert and combine rows
我有一個來自機器的時間序列數據框,其中的值來自不同的標簽和一些 diff 格式的標簽。
| datetime | tagid | value |
|---------------------|--------|-------|
| 08-04-2021 11:30:58 | BNO_01 | 12849 |
| 08-04-2021 11:30:58 | BNO_02 | 12597 |
| 08-04-2021 11:30:58 | BNO_03 | 14390 |
| 08-04-2021 11:30:58 | MDL_01 | 21328 |
| 08-04-2021 11:30:58 | MDL_02 | 22304 |
| 08-04-2021 11:30:58 | SEQ_01 | 12340 |
| 08-04-2021 11:30:58 | SEQ_02 | 13622 |
| 08-04-2021 11:30:58 | STA | 724 |
| 08-04-2021 11:30:58 | STO | 735 |
使用轉換標簽 ID BNO_01、BNO_02、BNO_03、MDL_01、MDL_02、SEQ_01、SEQ_02
df['tagid'] = df['tagid'].apply(lambda x: chr(round(x / 256)) + chr(x % 256)) 但僅適用於上述標簽行
刪除行 MDL_01,MDL_02,BNO_01,BNO_02,BNO_03 並將文本合並為 BNO 行
刪除行 SEQ_01、SEQ_02,並將文本合並為 SEQ 行
例子:
MDL_01= 21328 --> 'SP',
MDL_02= 22304 --> 'W'
BNO_01= 12849 --> '21'
BNO_02= 12597 --> '15'
BNO_03= 14390 --> '86'
BNO = 'SPW 211586'
需要 dataframe
| datetime | tagid | value |
|---------------------|-------|------------|
| 08-04-2021 11:30:58 | BNO | SPW 211586 |
| 08-04-2021 11:30:58 | SEQ | 0456 |
| 08-04-2021 11:30:58 | STA | 724 |
| 08-04-2021 11:30:58 | STO | 735 |
Idea is filter values by Series.str.startswith
first filter by boolean indexing
, processing rows by lambda function
with split
, then sorting and after replace MDL
to BNO
aggregate values with join
, last use concat
with original filtered rows with no match condition by ~
對於倒置掩碼:
此解決方案的優點是不匹配的值不會更改,因此如果重復像 2 倍STA
並且也不會將values
更改為字符串,則永遠不會聚合。
df['datetime'] = pd.to_datetime(df['datetime'])
vals = ['BNO','MDL','SEQ']
mask = df['tagid'].str.startswith(tuple(vals))
df1 = df[mask].copy()
df1['value'] = df1['value'].apply(lambda x: chr(round(x / 256)) + chr(x % 256))
df1['tagid'] = df1['tagid'].str.split('_').str[0]
df1 = (df1.sort_values('tagid', ascending=False)
.replace({'MDL':'BNO'})
.groupby(['datetime','tagid'])['value']
.agg(''.join)
.reset_index())
df = pd.concat([df1, df[~mask]], ignore_index=True)
print (df)
datetime tagid value
0 2021-08-04 11:30:58 BNO SPW 211586
1 2021-08-04 11:30:58 SEQ 0456
2 2021-08-04 11:30:58 STA 724
3 2021-08-04 11:30:58 STO 735
首先將tagid
列包含_
的value
列值改為char。
然后從tagid
列中刪除_
。
df['value'].update(df.loc[df['tagid'].str.contains('_'), 'value'].apply(lambda x: chr(round(x / 256)) + chr(x % 256)))
df['tagid'] = df['tagid'].apply(lambda x: x.split('_')[0])
# print(df)
datetime tagid value
0 08-04-2021 11:30:58 BNO 21
1 08-04-2021 11:30:58 BNO 15
2 08-04-2021 11:30:58 BNO 86
3 08-04-2021 11:30:58 MDL SP
4 08-04-2021 11:30:58 MDL W
5 08-04-2021 11:30:58 SEQ 04
6 08-04-2021 11:30:58 SEQ 56
7 08-04-2021 11:30:58 STA 724
8 08-04-2021 11:30:58 STO 735
此外, groupby()
datetime
時間和tagid
列,並用''
連接每個組中的value
列。
df_ = df.groupby(['datetime','tagid']).apply(lambda x: ''.join(map(str, x['value'].tolist()))).reset_index().rename({0: 'value'}, axis=1)
print(df_)
datetime tagid value
0 08-04-2021 11:30:58 BNO 211586
1 08-04-2021 11:30:58 MDL SPW
2 08-04-2021 11:30:58 SEQ 0456
3 08-04-2021 11:30:58 STA 724
4 08-04-2021 11:30:58 STO 735
最后結合BNO
和MDL
行並刪除MDL
行。
df_.loc[df_['tagid'] == 'BNO', 'value'] = df_.loc[df_['tagid'] == 'MDL', 'value'].iloc[0] + ' ' + df_.loc[df_['tagid'] == 'BNO', 'value'].iloc[0]
df_ = df_[~(df_['tagid'] == 'MDL')]
# print(df_)
datetime tagid value
0 08-04-2021 11:30:58 BNO SPW 211586
2 08-04-2021 11:30:58 SEQ 0456
3 08-04-2021 11:30:58 STA 724
4 08-04-2021 11:30:58 STO 735
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.