[英]Add columns to DataFrame with difference of specific columns based on values of another column
I have a dataframe that looks something like this the following:我有一个 dataframe 看起来像下面这样:
+------------+------------------+--------+-----+-----+---+--------+-----------------------------+
| B_date | B_Time | F_Type | Fix | Est | S | C_Type | C_Time |
+------------+------------------+--------+-----+-----+---+--------+-----------------------------+
| 2019-07-22 | 16:42:27.7325458 | 1 | 100 | 100 | 2 | 2 | 2019-07-22 16:42:47.2129273 |
| 2019-07-22 | 16:44:04.7817750 | 1 | 100 | 100 | 2 | 2 | 2019-07-22 16:45:26.2923547 |
| 2019-07-22 | 16:48:21.5976290 | 1 | 100 | 100 | 7 | | |
| 2019-07-23 | 13:11:20.4519581 | 1 | 100 | 100 | 7 | | |
| 2019-07-23 | 13:28:49.5092331 | 1 | 100 | 100 | 2 | 2 | 2019-07-23 13:28:54.5274793 |
| 2019-07-23 | 13:29:06.6108796 | 1 | 100 | 100 | 2 | 2 | 2019-07-23 13:30:48.5358081 |
| 2019-07-23 | 13:31:12.7684213 | 1 | 100 | 100 | 2 | 3 | 2019-07-23 13:33:50.9405643 |
| 2019-07-25 | 09:32:12.7799801 | 1 | 105 | 105 | 7 | | |
| 2019-07-25 | 09:57:58.4536238 | 1 | 158 | 158 | 4 | | |
| 2019-07-25 | 10:03:22.7888221 | 1 | 152 | 152 | 2 | 2 | 2019-07-25 10:03:27.9576175 |
+------------+------------------+--------+-----+-----+---+--------+-----------------------------+
I need to get output as follows:我需要得到 output 如下:
+------------+-------------------------------+--------+-----+-----+---+--------+-------------------------------+---------------+-----------------+---------------+
| B_date | B_Time | F_Type | Fix | Est | S | C_Type | C_Time | cancel_diff_1 | cancel_diff_2 | cancel_diff_3 |
+------------+-------------------------------+--------+-----+-----+---+--------+-------------------------------+---------------+-----------------+---------------+
| 2019-07-22 | 2019-07-22 16:42:27.732545800 | 1 | 100 | 100 | 2 | 2 | 2019-07-22 16:42:47.212927300 | NaT | 00:00:19.480381 | NaT |
| 2019-07-22 | 2019-07-22 16:44:04.781775000 | 1 | 100 | 100 | 2 | 2 | 2019-07-22 16:45:26.292354700 | NaT | 00:01:21.510579 | NaT |
| 2019-07-22 | 2019-07-22 16:48:21.597629000 | 1 | 100 | 100 | 7 | NaN | NaT | NaT | NaT | NaT |
| 2019-07-23 | 2019-07-23 13:11:20.451958100 | 1 | 100 | 100 | 7 | NaN | NaT | NaT | NaT | NaT |
| 2019-07-23 | 2019-07-23 13:28:49.509233100 | 1 | 100 | 100 | 2 | 2 | 2019-07-23 13:28:54.527479300 | NaT | 00:00:05.018246 | NaT |
+------------+-------------------------------+--------+-----+-----+---+--------+-------------------------------+---------------+-----------------+---------------+
I have actually done it using a function but it and assigning and checking for values which you can say is a python way, I want to do it in simple pandas.我实际上已经使用 function 完成了它,但是它并分配和检查你可以说是 python 方式的值,我想用简单的 pandas 方式来完成它。
IIUC try this: IIUC 试试这个:
df['B_Time']=df['B_Date']+' '+df['B_Time']
df['B_Time']=pd.to_datetime(df['B_Time'])
df.loc[df['C_Type']==1.0, 'diff_1']=df.loc[df['C_Type']==1, 'C_Time']-df.loc[df['C_Time']==1, 'B_Time']
df.loc[df['C_Type']==2.0, 'diff_2']=df.loc[df['C_Type']==2, 'C_Time']-df.loc[df['C_Time']==2, 'B_Time']
df.loc[df['C_Type']==3.0, 'diff_3']=df.loc[df['C_Type']==3, 'C_Time']-df.loc[df['C_Time']==3, 'B_Time']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.