簡體   English   中英

數據框轉換和組合行

[英]Data frame convert and combine rows

我有一個來自機器的時間序列數據框,其中的值來自不同的標簽和一些 diff 格式的標簽。

| datetime            | tagid  | value |
|---------------------|--------|-------|
| 08-04-2021 11:30:58 | BNO_01 | 12849 |
| 08-04-2021 11:30:58 | BNO_02 | 12597 |
| 08-04-2021 11:30:58 | BNO_03 | 14390 |
| 08-04-2021 11:30:58 | MDL_01 | 21328 |
| 08-04-2021 11:30:58 | MDL_02 | 22304 |
| 08-04-2021 11:30:58 | SEQ_01 | 12340 |
| 08-04-2021 11:30:58 | SEQ_02 | 13622 |
| 08-04-2021 11:30:58 | STA    | 724   |
| 08-04-2021 11:30:58 | STO    | 735   |

  1. 使用轉換標簽 ID BNO_01、BNO_02、BNO_03、MDL_01、MDL_02、SEQ_01、SEQ_02
    df['tagid'] = df['tagid'].apply(lambda x: chr(round(x / 256)) + chr(x % 256)) 但僅適用於上述標簽行

  2. 刪除行 MDL_01,MDL_02,BNO_01,BNO_02,BNO_03 並將文本合並為 BNO 行

  3. 刪除行 SEQ_01、SEQ_02,並將文本合並為 SEQ 行

例子:
MDL_01= 21328 --> 'SP',
MDL_02= 22304 --> 'W'
BNO_01= 12849 --> '21'
BNO_02= 12597 --> '15'
BNO_03= 14390 --> '86'

BNO = 'SPW 211586'

需要 dataframe

| datetime            | tagid | value      |
|---------------------|-------|------------|
| 08-04-2021 11:30:58 | BNO   | SPW 211586 |
| 08-04-2021 11:30:58 | SEQ   | 0456       |
| 08-04-2021 11:30:58 | STA   | 724        |
| 08-04-2021 11:30:58 | STO   | 735        |

Idea is filter values by Series.str.startswith first filter by boolean indexing , processing rows by lambda function with split , then sorting and after replace MDL to BNO aggregate values with join , last use concat with original filtered rows with no match condition by ~對於倒置掩碼:

此解決方案的優點是不匹配的值不會更改,因此如果重復像 2 倍STA並且也不會將values更改為字符串,則永遠不會聚合。

df['datetime'] = pd.to_datetime(df['datetime'])

vals = ['BNO','MDL','SEQ']
mask = df['tagid'].str.startswith(tuple(vals))

df1 = df[mask].copy()
df1['value'] = df1['value'].apply(lambda x: chr(round(x / 256)) + chr(x % 256))
df1['tagid'] = df1['tagid'].str.split('_').str[0]

df1 = (df1.sort_values('tagid', ascending=False)
          .replace({'MDL':'BNO'})
          .groupby(['datetime','tagid'])['value']
          .agg(''.join)
          .reset_index())


df = pd.concat([df1, df[~mask]], ignore_index=True)
print (df)
             datetime tagid       value
0 2021-08-04 11:30:58   BNO  SPW 211586
1 2021-08-04 11:30:58   SEQ        0456
2 2021-08-04 11:30:58   STA         724
3 2021-08-04 11:30:58   STO         735

首先將tagid列包含_value列值改為char。

然后從tagid列中刪除_

df['value'].update(df.loc[df['tagid'].str.contains('_'), 'value'].apply(lambda x: chr(round(x / 256)) + chr(x % 256)))
df['tagid'] = df['tagid'].apply(lambda x: x.split('_')[0])
# print(df)

              datetime tagid value
0  08-04-2021 11:30:58   BNO    21
1  08-04-2021 11:30:58   BNO    15
2  08-04-2021 11:30:58   BNO    86
3  08-04-2021 11:30:58   MDL    SP
4  08-04-2021 11:30:58   MDL    W 
5  08-04-2021 11:30:58   SEQ    04
6  08-04-2021 11:30:58   SEQ    56
7  08-04-2021 11:30:58   STA   724
8  08-04-2021 11:30:58   STO   735

此外, groupby() datetime時間和tagid列,並用''連接每個組中的value列。

df_ = df.groupby(['datetime','tagid']).apply(lambda x: ''.join(map(str, x['value'].tolist()))).reset_index().rename({0: 'value'}, axis=1)
print(df_)

              datetime tagid   value
0  08-04-2021 11:30:58   BNO  211586
1  08-04-2021 11:30:58   MDL    SPW 
2  08-04-2021 11:30:58   SEQ    0456
3  08-04-2021 11:30:58   STA     724
4  08-04-2021 11:30:58   STO     735

最后結合BNOMDL行並刪除MDL行。

df_.loc[df_['tagid'] == 'BNO', 'value'] = df_.loc[df_['tagid'] == 'MDL', 'value'].iloc[0] + ' ' + df_.loc[df_['tagid'] == 'BNO', 'value'].iloc[0]
df_ = df_[~(df_['tagid'] == 'MDL')]
# print(df_)

              datetime tagid        value
0  08-04-2021 11:30:58   BNO  SPW  211586
2  08-04-2021 11:30:58   SEQ         0456
3  08-04-2021 11:30:58   STA          724
4  08-04-2021 11:30:58   STO          735

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM