繁体   English   中英

如何在 python 中找到给定日期每周的总播放时间?

[英]how to find total play time of each week for the given date in python?

我有一个看起来像下面的数据框

k={'user_id':[1,1,1,1,1,2,2,2,3,3,3,3,3,4,4,4,5,5],
   'created':[ '2/09/2021','2/10/2021','2/16/2021','2/17/2021','3/09/2021','3/10/2021','3/18/2021','3/19/2021',
              '2/19/2021','2/20/2021','2/26/2021','2/27/2021','3/09/2021','2/10/2021','2/18/2021','3/19/2021',
             '3/24/2021','3/30/2021',],
   'stop_time':[11,12,13,14,15,25,26,27,6,7,8,9,10,11,12,13,25,26],
  'play_time':[10,11,12,13,14,24,25,26,5,6,7,8,9,10,11,13,24,25]}

df=pd.DataFrame(data=k)

df['created']=pd.to_datetime(df['created'], format='%m/%d/%Y')
df['total_play_time'] = df['stop_time'] - df['play_time']

在此处输入图像描述

现在我们需要使用每个 user_id 的第一个日期作为第一周的开始日期,例如我们需要 select '2/9/2021' 是 user_id 1 的第一周开始日期和 '3/09/2021'作为 user_id 2 的第一周开始日期。

我们需要对 user_id 每周的总游戏时间求和,它继续给每个总和,直到当前日期(例如,如果运行报告到今天,它必须给出每周总和直到今天)并给出如下结果

ID  week1   week2     week3  week4  week5  week6 week7  week8     week9  week10  week11  week12
1   3        2        0      0      0      0     0      0         0       0       0      0
2   1        2        0      0      0      0     0
# Get a list of unique id's
user_ids = df["user_id"].unique()

# Get the start date of each user
start_dates = [min(df[df["user_id"]==usr]["created"]) for usr in user_ids]

# We will subtract the start date to have a common baseline for all users
df["time_since_start"] = None
for i, usr in enumerate(user_ids):
    df.loc[df["user_id"]==usr,"time_since_start"] = df.loc[df["user_id"]==usr,"created"] - start_dates[i]
# we got a Timedelta object, but its more useful as a float
df['t'] = [x.value for x in df["time_since_start"]]

# get the maximum time any user has ever ..played? to make our bins
max_time = df["time_since_start"].max()
# convert it from microseconds to weeks, rounding up
max_weeks = int(np.ceil(max_time.value/8.64e+13/7))

# make the bins and add corresponding readable labels
bins = [pd.Timedelta(weeks = wk).value for wk in range(max_weeks+1)]
labels = ["week " + str(wk+1) for wk in range(max_weeks)]

# bin the data and aggregate the result
df["bin"] = pd.cut(df['t'], bins, labels = labels)
df.groupby(['user_id','bin'])['total_play_time'].sum()
user_id  bin   
1        week 1    2
         week 2    1
         week 3    0
         week 4    1
         week 5    0
         week 6    0
2        week 1    0
         week 2    2
         week 3    0
         week 4    0
         week 5    0
         week 6    0
3        week 1    2
         week 2    1
         week 3    1
         week 4    0
         week 5    0
         week 6    0
4        week 1    0
         week 2    1
         week 3    0
         week 4    0
         week 5    0
         week 6    0
5        week 1    1
         week 2    0
         week 3    0
         week 4    0
         week 5    0
         week 6    0
Name: total_play_time, dtype: int64

然后,如果您确实需要,您可以将 dataframe 重塑为宽格式。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM