I'm novoce to pandas. Need to calculate time for each person, for each location and drop rows without pair in dates col. My data looks like this:
Unit Name Location Date Time
0 K1 Somebody1 LOC1 2020-05-12 07:00
1 K1 Somebody1 LOC1 2020-05-12 20:10
2 K1 Somebody1 LOC1 2020-05-13 06:00
3 K1 Somebody1 LOC1 2020-05-13 20:00
4 K1 Somebody1 LOC1 2020-05-14 06:37
5 K1 Somebody1 LOC2 2020-05-15 07:00
6 K1 Somebody1 LOC2 2020-05-15 20:10
7 K1 Somebody1 LOC2 2020-05-16 06:00
8 K1 Somebody1 LOC2 2020-05-16 20:00
9 K1 Somebody1 LOC2 2020-05-17 06:37
10 K1 Somebody2 LOC2 2020-05-13 07:00
11 K1 Somebody2 LOC2 2020-05-14 10:10
12 K1 Somebody2 LOC2 2020-05-14 16:50
13 K1 Somebody2 LOC2 2020-05-15 05:36
14 K1 Somebody3 LOC1 2020-05-13 07:00
15 K1 Somebody3 LOC1 2020-05-14 10:10
16 K1 Somebody3 LOC1 2020-05-14 16:50
17 K1 Somebody3 LOC1 2020-05-15 05:36
I only menaged to convert time to datetime object by
df['Time'] = df['Time'].apply(lambda x: datetime.strptime(x,'%H:%M').time())
Tried using pivot tables, grouping by, for loops and I'm out of ideas. I wanted output to look like that:
LOC1
Somebody1 2020-05-12 13h 10m
2020-05-13 14h 00m
TOTAL 27h 00m
Somebody2 date hours
date hours
TOTAL sum for somebody2
Somebody3 date hours
date hours
TOTAL sum for somebody3
LOC2
Somebody1 date hours
date hours
TOTAL sum for somebody1
Somebody2 date hours
date hours
TOTAL sum for somebody2
or something similar
IIUC groupby
and combine first
import numpy as np
df['datetime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
df1 = df.groupby(['Name','Location', df['datetime'].dt.normalize()])\
.agg(start=('datetime','first'),
end=('datetime','last'))
df1['timespent'] = (df1['end'] - df1['start']) / np.timedelta64(1,'h')
# create total row.
m = df1.unstack(['Name','Location'])['timespent'].sum().unstack()
m = m.assign(TOTAL=m.sum(1)).stack().to_frame('timespent')
final = df1.drop(['start','end'],axis=1).combine_first(m)
#if you want to remove single entry days
final[final['timespent'] > 0]
timespent
Name Location datetime
Somebody1 LOC1 2020-05-12 13.166667
2020-05-13 14.000000
TOTAL NaT 27.166667
Somebody2 LOC2 2020-05-14 6.666667
TOTAL NaT 6.666667
You can begin with grep to collect times per two rows and then calculate the time difference. For example, parse the names of peoples into one list and then using grep do:
for i in $(cat list-names);do grep $i a.csv | awk '{print$6}';done
where a.csv:
0 K1 Somebody1 LOC1 2020-05-12 17:00
1 K1 Somebody1 LOC1 2020-05-12 20:10
Also, to grab the difference in Hours do:
awk '
NR == 1{old = $6; next}
{print $6 - old; old = $6}
' a.csv
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.