[英]Sum values of a column (Revenue) based on: the values of another column (Date) AND the value of another column (UserId) in pandas
I have two tables/dataframes: users and activity .我有两个表/数据框: users和activity 。
In users I have the following columns: UserId , Country , DOB , Gender , RegDate , WeekAfterRegDate在用户中,我有以下列: UserId 、 Country 、 DOB 、 Gender 、 RegDate 、 WeekAfterRegDate
where:在哪里:
UserId : the Id of each user (only appears once in this table), there only one row for each UserId in this dataframe/table --> It is also the key column that links both tables/dataframes UserId : 每个用户的Id(在这个表中只出现一次),这个dataframe/table中每个UserId只有一行 --> 也是链接两个table/dataframe的关键列
DOB : date of birth DOB : 出生日期
RegDate : Registration date of the user RegDate : 用户的注册日期
WeekAfterRegDate : The date after 7 days since registration WeekAfterRegDate : 注册后 7 天后的日期
In activity I have the following columns: UserId , Date , Revenue在活动中,我有以下列: UserId , Date , Revenue
where:在哪里:
UserId : the same column as in the users , but it can appear in more than one row here as there dare different revenues UserId : 与users中的同一列,但这里可以出现多行,因为有不同的收入。
I need to calculate the average revenue generated per user in the first week我需要计算第一周每个用户产生的平均收入
And I have been given these clues, which might be useful:我得到了这些线索,它们可能有用:
In summary what I need to do is make a loop that sums Renevue between two Dates for each UserId .总之,我需要做的是创建一个循环,将每个UserId的两个Dates之间的Renevue相加。 The period between the two dates is RegDate and WeekAfterRegDate .
两个日期之间的时间段是RegDate和WeekAfterRegDate 。
I have been trying different methods, like groupby, etc, but I am a bit lost.我一直在尝试不同的方法,比如 groupby 等,但我有点迷茫。
Make sure your date column is actually in datetime, since you won't be able to compare strings in order to filter out only those instances within the first week.确保您的日期列实际上是日期时间,因为您将无法比较字符串以便在第一周内仅过滤掉那些实例。 See here for converting strings into datetime .
请参阅此处将字符串转换为 datetime 。
Merge both tables:合并两个表:
df_merged = pd.merge(activity,users,on='UserID')
You get the activity table including the respective dates in each row.您将获得活动表,其中包括每行中的相应日期。
Filter the merged list:过滤合并列表:
df_merged = df_merged.loc[df_merged['Date'] >= df_merged['RegDate']] # lower bound
df_merged = df_merged.loc[df_merged['Date'] < df_merged['WeekAfterRegDate']] # upper bound
The table now contains only the relevant rows.该表现在仅包含相关行。
Now group by user and sum the revenue:现在按用户分组并对收入求和:
df_revenue = df_merged.groupby('UserID')['Revenue'].sum()
here's what i'd do: first, make a list of the users from the first dataframe这就是我要做的:首先,列出第一个 dataframe 中的用户
user_list = first_df.UserId.unique().tolist()
then iterate over this list and over the second database something like this:然后遍历这个列表和第二个数据库,如下所示:
revenue_total = 0
for i in range(len(user_list)):
for x in range(len(second_df):
if second_df['userid'][x] == user_list[i] and second_df['Date'][x] <= first_df['WeekAfterRegDate'][i]:
revenue_total = revenue_total + second_df['Revenue'][x]
then just simply divide the total revenue with the total users然后只需将总收入除以总用户数
total_revenue /len(user_list)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.