[英]how to find total play time of each week for the given date in python?
我有一個看起來像下面的數據框
k={'user_id':[1,1,1,1,1,2,2,2,3,3,3,3,3,4,4,4,5,5],
'created':[ '2/09/2021','2/10/2021','2/16/2021','2/17/2021','3/09/2021','3/10/2021','3/18/2021','3/19/2021',
'2/19/2021','2/20/2021','2/26/2021','2/27/2021','3/09/2021','2/10/2021','2/18/2021','3/19/2021',
'3/24/2021','3/30/2021',],
'stop_time':[11,12,13,14,15,25,26,27,6,7,8,9,10,11,12,13,25,26],
'play_time':[10,11,12,13,14,24,25,26,5,6,7,8,9,10,11,13,24,25]}
df=pd.DataFrame(data=k)
df['created']=pd.to_datetime(df['created'], format='%m/%d/%Y')
df['total_play_time'] = df['stop_time'] - df['play_time']
現在我們需要使用每個 user_id 的第一個日期作為第一周的開始日期,例如我們需要 select '2/9/2021' 是 user_id 1 的第一周開始日期和 '3/09/2021'作為 user_id 2 的第一周開始日期。
我們需要對 user_id 每周的總游戲時間求和,它繼續給每個總和,直到當前日期(例如,如果運行報告到今天,它必須給出每周總和直到今天)並給出如下結果
ID week1 week2 week3 week4 week5 week6 week7 week8 week9 week10 week11 week12
1 3 2 0 0 0 0 0 0 0 0 0 0
2 1 2 0 0 0 0 0
# Get a list of unique id's
user_ids = df["user_id"].unique()
# Get the start date of each user
start_dates = [min(df[df["user_id"]==usr]["created"]) for usr in user_ids]
# We will subtract the start date to have a common baseline for all users
df["time_since_start"] = None
for i, usr in enumerate(user_ids):
df.loc[df["user_id"]==usr,"time_since_start"] = df.loc[df["user_id"]==usr,"created"] - start_dates[i]
# we got a Timedelta object, but its more useful as a float
df['t'] = [x.value for x in df["time_since_start"]]
# get the maximum time any user has ever ..played? to make our bins
max_time = df["time_since_start"].max()
# convert it from microseconds to weeks, rounding up
max_weeks = int(np.ceil(max_time.value/8.64e+13/7))
# make the bins and add corresponding readable labels
bins = [pd.Timedelta(weeks = wk).value for wk in range(max_weeks+1)]
labels = ["week " + str(wk+1) for wk in range(max_weeks)]
# bin the data and aggregate the result
df["bin"] = pd.cut(df['t'], bins, labels = labels)
df.groupby(['user_id','bin'])['total_play_time'].sum()
user_id bin
1 week 1 2
week 2 1
week 3 0
week 4 1
week 5 0
week 6 0
2 week 1 0
week 2 2
week 3 0
week 4 0
week 5 0
week 6 0
3 week 1 2
week 2 1
week 3 1
week 4 0
week 5 0
week 6 0
4 week 1 0
week 2 1
week 3 0
week 4 0
week 5 0
week 6 0
5 week 1 1
week 2 0
week 3 0
week 4 0
week 5 0
week 6 0
Name: total_play_time, dtype: int64
然后,如果您確實需要,您可以將 dataframe 重塑為寬格式。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.