简体   繁体   English

列(收入)的总和值基于:pandas 中另一列(日期)的值和另一列(用户 ID)的值

[英]Sum values of a column (Revenue) based on: the values of another column (Date) AND the value of another column (UserId) in pandas

I have two tables/dataframes: users and activity .我有两个表/数据框: usersactivity

In users I have the following columns: UserId , Country , DOB , Gender , RegDate , WeekAfterRegDate用户中,我有以下列: UserIdCountryDOBGenderRegDateWeekAfterRegDate

where:在哪里:

UserId : the Id of each user (only appears once in this table), there only one row for each UserId in this dataframe/table --> It is also the key column that links both tables/dataframes UserId : 每个用户的Id(在这个表中只出现一次),这个dataframe/table中每个UserId只有一行 --> 也是链接两个table/dataframe的关键列

DOB : date of birth DOB : 出生日期

RegDate : Registration date of the user RegDate : 用户的注册日期

WeekAfterRegDate : The date after 7 days since registration WeekAfterRegDate : 注册后 7 天后的日期

In activity I have the following columns: UserId , Date , Revenue活动中,我有以下列: UserIdDateRevenue

where:在哪里:

UserId : the same column as in the users , but it can appear in more than one row here as there dare different revenues UserId : 与users中的同一列,但这里可以出现多行,因为有不同的收入。

I need to calculate the average revenue generated per user in the first week我需要计算第一周每个用户产生的平均收入

And I have been given these clues, which might be useful:我得到了这些线索,它们可能有用:

  1. Merge the 2 datasets合并 2 个数据集
  2. Calculate the days since registration for each user and date in the activity table计算每个用户注册后的天数和活动表中的日期
  3. Consider ALL REVENUE (not just the one generated by each user) generated in the first 7 days after registration for each user考虑每个用户注册后前 7 天产生的所有收入(不仅仅是每个用户产生的收入)

In summary what I need to do is make a loop that sums Renevue between two Dates for each UserId .总之,我需要做的是创建一个循环,将每个UserId的两个Dates之间的Renevue相加。 The period between the two dates is RegDate and WeekAfterRegDate .两个日期之间的时间段是RegDateWeekAfterRegDate

I have been trying different methods, like groupby, etc, but I am a bit lost.我一直在尝试不同的方法,比如 groupby 等,但我有点迷茫。

Make sure your date column is actually in datetime, since you won't be able to compare strings in order to filter out only those instances within the first week.确保您的日期列实际上是日期时间,因为您将无法比较字符串以便在第一周内仅过滤掉那些实例。 See here for converting strings into datetime .请参阅此处将字符串转换为 datetime

Merge both tables:合并两个表:

df_merged = pd.merge(activity,users,on='UserID')

You get the activity table including the respective dates in each row.您将获得活动表,其中包括每行中的相应日期。

Filter the merged list:过滤合并列表:

df_merged = df_merged.loc[df_merged['Date'] >= df_merged['RegDate']] # lower bound
df_merged = df_merged.loc[df_merged['Date'] < df_merged['WeekAfterRegDate']] # upper bound

The table now contains only the relevant rows.该表现在仅包含相关行。

Now group by user and sum the revenue:现在按用户分组并对收入求和:

df_revenue = df_merged.groupby('UserID')['Revenue'].sum()

here's what i'd do: first, make a list of the users from the first dataframe这就是我要做的:首先,列出第一个 dataframe 中的用户

user_list = first_df.UserId.unique().tolist()

then iterate over this list and over the second database something like this:然后遍历这个列表和第二个数据库,如下所示:

revenue_total = 0
for i in range(len(user_list)):
    for x in range(len(second_df):
        if second_df['userid'][x] == user_list[i] and second_df['Date'][x] <= first_df['WeekAfterRegDate'][i]:
            revenue_total = revenue_total + second_df['Revenue'][x]

then just simply divide the total revenue with the total users然后只需将总收入除以总用户数

 total_revenue /len(user_list)
            

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据熊猫数据框中的另一列值计算值的总和? - Calculate the sum of values based on another column value in pandas dataframe? 根据另一列中的项目对pandas列中的值求和 - Sum the values in a pandas column based on the items in another column 熊猫,根据另一列的值减去值 - Pandas, subtract values based on value of another column 由另一列 pandas 分组的列中的总和值 - sum values in column grouped by another column pandas Python Pandas DataFrame - 如何根据另一列(日期类型)中的部分匹配对 1 列中的值求和? - Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)? Pandas 根据另一列中的值对相应值求和 - Pandas sum corresponding values based on values in another column 基于另一列中的值的列的累积总和? - Cumulative Sum of a column based on values in another column? 一列值的和基于另一个列的每个值,然后将其除以总计 - Sum values of a column for each value based on another column and divide it by total Select 基于另一列中某个值的列值之和 - Select the sum of column values based on a certain value in another column 如何根据熊猫中另一列的底值将一列中的值求和? - How can I sum the values in one column based on the floor'd value of another column in pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM