[英]Python Pandas - Slice DataFrame based on Another Table's Values to Match to Column Name
I have two dataframes, df_stats and df_ratings.我有两个数据框,df_stats 和 df_ratings。
df_stats looks like this df_stats 看起来像这样
Fruit水果 | Rating_Threshold_Low Rating_Threshold_Low | Rating_Threshold_High Rating_Threshold_High | |
---|---|---|---|
1 1 | Apple苹果 | 4 4 | 7 7 |
2 2 | Banana香蕉 | 5 5 | 9 9 |
3 3 | Kiwi猕猴桃 | 6 6 | 8 8 |
df_ratings looks like this (the first column is the Fruit
and each subsequent column represents a rating
. df_ratings 看起来像这样(第一列是Fruit
并且每个后续列代表一个rating
。
Fruit水果 | 1 1 | 2 2 | 3 3 | 4 4 | 5 5 | 6 6 | 7 7 | 8 8 | 9 9 | 10 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 1 | Apple苹果 | 2 2 | 4 4 | 7 7 | 13 13 | 2 2 | 0 0 | 16 16 | 1 1 | 9 9 | 22 22 |
2 2 | Banana香蕉 | 6 6 | 4 4 | 2 2 | 1 1 | 8 8 | 7 7 | 5 5 | 3 3 | 9 9 | 0 0 |
3 3 | Kiwi猕猴桃 | 21 21 | 4 4 | 3 3 | 6 6 | 8 8 | 9 9 | 9 9 | 8 8 | 7 7 | 5 5 |
What my goal is to get the sum of each the number of ratings within the rating threshold for each fruit (each fruit's rating threshold is different).我的目标是获得每个水果的评分阈值内的评分总数(每个水果的评分阈值不同)。 In other words, I want to add the column, Rating_Threshold_Sum in df_stats
, which computes the sum of the ratings within the threshold df_ratings
.换句话说,我想在 df_stats 添加列df_stats
,它计算阈值df_ratings
内的评分总和。 For example, for Apple
, the Rating threshold is between 4 and 7 (inclusive), so the Rating_Threshold_Sum
would be 13+2+0+16 = 31.例如,对于Apple
,Rating 阈值介于 4 和 7(含)之间,因此Rating_Threshold_Sum
将为 13+2+0+16 = 31。
So as a result, df_stats
would have the Ratings_Threshold_Sum
column:因此, df_stats
将具有Ratings_Threshold_Sum
列:
Fruit水果 | Rating_Threshold_Low Rating_Threshold_Low | Rating_Threshold_High Rating_Threshold_High | Rating_Threshold_Sum Rating_Threshold_Sum | |
---|---|---|---|---|
1 1 | Apple苹果 | 4 4 | 7 7 | 31 31 |
2 2 | Banana香蕉 | 5 5 | 9 9 | 32 32 |
3 3 | Kiwi猕猴桃 | 6 6 | 8 8 | 26 26 |
I am not sure how to exactly do that, I know that I may have to use df.apply with a custom function, or looping through each row, but aside from that, I'm not sure the best way to tackle this problem.我不确定该怎么做,我知道我可能必须将 df.apply 与自定义 function 一起使用,或者遍历每一行,但除此之外,我不确定解决这个问题的最佳方法。 Any advice / direction would be much appreciated.任何建议/方向将不胜感激。 Thank you!谢谢!
You could do something like this你可以做这样的事情
sums = []
for i in range(len(df_stats)):
min_v, max_v = df_stats["Rating_Threshold_Low"].values()[i], df_stats["Rating_Threshold_High"].values()[i]
values = []
for z in range(min_v, max_v+1):
x = df_ratings[str(z)][i]
values.append(x)
sums.append(sum(values))
df_stats["Rating_Threshold_Sum"] = sums
This is really complicated and there is probably a better way to do it but it should work.这真的很复杂,可能有更好的方法来做到这一点,但它应该可以工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.