簡體   English   中英

分組並減去熊貓中的第一次出現和最后一次出現

[英]group by and subtract first occurrence and last occurrence in pandas

我在熊貓中有以下數據框

 code    date         time         dip     flag   tank   qty
 123     2018-12-23   08:00:00     389     0      1      1300
 123     2018-12-23   09:00:00     380     0      1      1250
 123     2018-12-23   10:00:00     378     0      1      1200
 123     2018-12-23   11:00:00     345     1      1      1150
 123     2018-12-23   12:00:00     342     1      1      1100
 123     2018-12-23   13:00:00     340     1      1      1050
 123     2018-12-23   14:00:00     338     1      1      1000
 123     2018-12-23   15:00:00     380     0      1      1500
 123     2018-12-23   16:00:00     340     1      1      1000
 123     2018-12-23   17:00:00     340     1      1      1000
 123     2018-12-23   08:00:00     389     0      2      1300
 123     2018-12-23   09:00:00     380     0      2      1250
 123     2018-12-23   10:00:00     378     0      2      1200
 123     2018-12-23   11:00:00     345     1      2      1150
 123     2018-12-23   12:00:00     342     1      2      1100
 123     2018-12-23   13:00:00     340     1      2      1050
 123     2018-12-23   14:00:00     338     1      2      1000

我想找出dip低於 350 的次數,直到什么時間(以小時為單位)保持低於 350,以及低於 350 時的銷售數量是我想要的數據幀。 當下降小於 350 時,我已經將標志設置為 1

 code    date        tank     frequency    qty_sold    time
 123     2018-12-23  1        4            150         3
 123     2018-12-23  2        4            150         3

我可以通過 groupby 找到頻率。 需要一些幫助才能找到另外兩個

  df_agg= df.groupby(['code','date','tank']).agg({'flag':['sum']}).reset_index()

用:

#create datetimes column
df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])

#add aggregation by first and last 
df_agg= df[df['dip'] < 350].groupby(['code','date','tank']).agg({'flag':['sum'], 
                                                                'datetime':['first','last'],
                                                                'qty':['first','last']})
#flatten MultiIndex
df_agg.columns = df_agg.columns.map('_'.join)

#substract columns, timedeltas convert to hours
df_agg['qty_sold'] = df_agg.pop('qty_first') - df_agg.pop('qty_last') 
df_agg['time'] = (df_agg.pop('datetime_last') - df_agg.pop('datetime_first'))
                       .dt.total_seconds().div(3600).astype(int)
#rename column and create default index
df_agg = df_agg.rename(columns={'flag_size':'frequency'}).reset_index()

print (df_agg)
   code        date  tank  flag_sum  qty_sold  time
0   123  2018-12-23     1         4       150     3
1   123  2018-12-23     2         4       150     3

編輯:

如果datetime值中沒有缺失值且date time頻率相差一小時,則解決方案有效。

如果差異更像是1小時和前 3 個級別的最后總和,則想法是為組創建新的輔助列g

df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])

df_agg= df[df['dip'] < 350].copy()

df_agg['g'] = (df_agg.groupby(['code','date','tank'])['datetime'].diff()
                     .ne(pd.Timedelta(1, 'H'))
                     .cumsum())

df_agg= df_agg.groupby(['code','date','tank','g']).agg({'flag':['sum'], 
                                                        'datetime':['first','last'],
                                                        'qty':['first','last']})
df_agg.columns = df_agg.columns.map('_'.join)
df_agg['qty_sold'] = df_agg.pop('qty_first') - df_agg.pop('qty_last') 
df_agg['time'] = ((df_agg.pop('datetime_last') - df_agg.pop('datetime_first'))
                         .dt.total_seconds().div(3600).astype(int))

df_agg = (df_agg.rename(columns={'flag_size':'frequency'})
                .sum(level=[0,1,2])
                .reset_index()
          )

print (df_agg)
   code        date  tank  flag_sum  qty_sold  time
0   123  2018-12-23     1         6       150     4
1   123  2018-12-23     2         4       150     3

你可以做:

# to get till what time (hour)
df.loc[df['dip'].lt(350),'time'].dt.hour.max()

# what is the quantity sold
df.loc[df['dip'].lt(350),'qty'].sum()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM