简体   繁体   中英

Looking for a way to group by a datetime if datetime between two dates using Pandas of Python

I am trying to do the following using Pandas (Python).

I have a dataframe with the following columns:

Building, Door_Color, Door_Time_Open, Door_Time_Close, Opening_Width

I am trying to group the data by date and time in such a way that for each second I would count the number of doors open and the sum of the width_of_opening.

for example:

Data:
Building, Door_Color, Door_Time_Open, Door_Time_Close, Opening_Width
A , Red , 2000-01-01 00:00:00, 2000-01-01 00:00:05, 10
A , Red , 2000-01-01 00:00:02, 2000-01-01 00:00:04, 5

Result:
Date, Building, Door_Color, Door_Count, Sum_Opening_Width
2000-01-01 00:00:00, A, Red, 1 , 10
2000-01-01 00:00:01, A, Red, 1 , 10
2000-01-01 00:00:02, A, Red, 2 , 15
2000-01-01 00:00:03, A, Red, 2 , 15
2000-01-01 00:00:04, A, Red, 2 , 15
2000-01-01 00:00:05, A, Red, 1 , 10
2000-01-01 00:00:06, A, Red, 0 , 0

I know how to do a regular group by multiple columns and aggregate different columns separately but I haven't got a clue how to get the machine to check if the date we are grouping by falls between the two dates in the data.

Any help would be much appreciated!

edit1: data is a little big, about 6 million rows.

If the data is not too big (covering long period of time), you can do a cross merge:

times = pd.DataFrame({'Date':pd.date_range(df['Door_Time_Open'].min(), 
                                           df['Door_Time_Close'].max(), freq='s'),
                      'dummy':1
                     })


(df.assign(dummy=1)
   .merge(times, on='dummy')
   .query('Door_Time_Open<=Date<=Door_Time_Close')
   .groupby(['Date','Building','Door_Color'])
   ['Opening_Width'].agg(['count','sum'])
   .reset_index()
)

Output:

                 Date Building Door_Color  count  sum
0 2000-01-01 00:00:00       A        Red       1   10
1 2000-01-01 00:00:01       A        Red       1   10
2 2000-01-01 00:00:02       A        Red       2   15
3 2000-01-01 00:00:03       A        Red       2   15
4 2000-01-01 00:00:04       A        Red       2   15
5 2000-01-01 00:00:05       A        Red       1   10

Process the time of each row and then group

def news(r):
    df1 = pd.DataFrame()
    df1['Date'] = pd.date_range(r['Door_Time_Open'],r['Door_Time_Close'],freq='s')
    for idx in ['Building','Door_Color','Opening_Width']:
        df1[idx] = r[idx]
    return df1

df['Door_Time_Open'] = pd.to_datetime(df['Door_Time_Open'])
df['Door_Time_Close'] = pd.to_datetime(df['Door_Time_Close'])
df_list = []
for idx,row in df.iterrows():
    df_list.append(news(row))
data = pd.concat(df_list).groupby(['Date','Building','Door_Color'])['Opening_Width'].agg(['count','sum'])
print(data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM