繁体   English   中英

将列添加到 DataFrame 中,特定列的差异基于另一列的值

[英]Add columns to DataFrame with difference of specific columns based on values of another column

我有一个 dataframe 看起来像下面这样:


+------------+------------------+--------+-----+-----+---+--------+-----------------------------+
|   B_date   |      B_Time      | F_Type | Fix | Est | S | C_Type |           C_Time            |
+------------+------------------+--------+-----+-----+---+--------+-----------------------------+
| 2019-07-22 | 16:42:27.7325458 |      1 | 100 | 100 | 2 |      2 | 2019-07-22 16:42:47.2129273 |
| 2019-07-22 | 16:44:04.7817750 |      1 | 100 | 100 | 2 |      2 | 2019-07-22 16:45:26.2923547 |
| 2019-07-22 | 16:48:21.5976290 |      1 | 100 | 100 | 7 |        |                             |
| 2019-07-23 | 13:11:20.4519581 |      1 | 100 | 100 | 7 |        |                             |
| 2019-07-23 | 13:28:49.5092331 |      1 | 100 | 100 | 2 |      2 | 2019-07-23 13:28:54.5274793 |
| 2019-07-23 | 13:29:06.6108796 |      1 | 100 | 100 | 2 |      2 | 2019-07-23 13:30:48.5358081 |
| 2019-07-23 | 13:31:12.7684213 |      1 | 100 | 100 | 2 |      3 | 2019-07-23 13:33:50.9405643 |
| 2019-07-25 | 09:32:12.7799801 |      1 | 105 | 105 | 7 |        |                             |
| 2019-07-25 | 09:57:58.4536238 |      1 | 158 | 158 | 4 |        |                             |
| 2019-07-25 | 10:03:22.7888221 |      1 | 152 | 152 | 2 |      2 | 2019-07-25 10:03:27.9576175 |
+------------+------------------+--------+-----+-----+---+--------+-----------------------------+

我需要得到 output 如下:


+------------+-------------------------------+--------+-----+-----+---+--------+-------------------------------+---------------+-----------------+---------------+
|   B_date   |            B_Time             | F_Type | Fix | Est | S | C_Type |            C_Time             | cancel_diff_1 |  cancel_diff_2  | cancel_diff_3 |
+------------+-------------------------------+--------+-----+-----+---+--------+-------------------------------+---------------+-----------------+---------------+
| 2019-07-22 | 2019-07-22 16:42:27.732545800 |      1 | 100 | 100 | 2 | 2      | 2019-07-22 16:42:47.212927300 | NaT           | 00:00:19.480381 | NaT           |
| 2019-07-22 | 2019-07-22 16:44:04.781775000 |      1 | 100 | 100 | 2 | 2      | 2019-07-22 16:45:26.292354700 | NaT           | 00:01:21.510579 | NaT           |
| 2019-07-22 | 2019-07-22 16:48:21.597629000 |      1 | 100 | 100 | 7 | NaN    | NaT                           | NaT           | NaT             | NaT           |
| 2019-07-23 | 2019-07-23 13:11:20.451958100 |      1 | 100 | 100 | 7 | NaN    | NaT                           | NaT           | NaT             | NaT           |
| 2019-07-23 | 2019-07-23 13:28:49.509233100 |      1 | 100 | 100 | 2 | 2      | 2019-07-23 13:28:54.527479300 | NaT           | 00:00:05.018246 | NaT           |
+------------+-------------------------------+--------+-----+-----+---+--------+-------------------------------+---------------+-----------------+---------------+

我实际上已经使用 function 完成了它,但是它并分配和检查你可以说是 python 方式的值,我想用简单的 pandas 方式来完成它。

IIUC 试试这个:

df['B_Time']=df['B_Date']+' '+df['B_Time']
df['B_Time']=pd.to_datetime(df['B_Time'])

df.loc[df['C_Type']==1.0, 'diff_1']=df.loc[df['C_Type']==1, 'C_Time']-df.loc[df['C_Time']==1, 'B_Time']
df.loc[df['C_Type']==2.0, 'diff_2']=df.loc[df['C_Type']==2, 'C_Time']-df.loc[df['C_Time']==2, 'B_Time']
df.loc[df['C_Type']==3.0, 'diff_3']=df.loc[df['C_Type']==3, 'C_Time']-df.loc[df['C_Time']==3, 'B_Time']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM