简体   繁体   English

Python Pandas - 根据要匹配到列名的另一个表的值切片 DataFrame

[英]Python Pandas - Slice DataFrame based on Another Table's Values to Match to Column Name

I have two dataframes, df_stats and df_ratings.我有两个数据框,df_stats 和 df_ratings。

df_stats looks like this df_stats 看起来像这样

Fruit水果 Rating_Threshold_Low Rating_Threshold_Low Rating_Threshold_High Rating_Threshold_High
1 1 Apple苹果 4 4 7 7
2 2 Banana香蕉 5 5 9 9
3 3 Kiwi猕猴桃 6 6 8 8

df_ratings looks like this (the first column is the Fruit and each subsequent column represents a rating . df_ratings 看起来像这样(第一列是Fruit并且每个后续列代表一个rating

Fruit水果 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
1 1 Apple苹果 2 2 4 4 7 7 13 13 2 2 0 0 16 16 1 1 9 9 22 22
2 2 Banana香蕉 6 6 4 4 2 2 1 1 8 8 7 7 5 5 3 3 9 9 0 0
3 3 Kiwi猕猴桃 21 21 4 4 3 3 6 6 8 8 9 9 9 9 8 8 7 7 5 5

What my goal is to get the sum of each the number of ratings within the rating threshold for each fruit (each fruit's rating threshold is different).我的目标是获得每个水果的评分阈值内的评分总数(每个水果的评分阈值不同)。 In other words, I want to add the column, Rating_Threshold_Sum in df_stats , which computes the sum of the ratings within the threshold df_ratings .换句话说,我想在 df_stats 添加列df_stats ,它计算阈值df_ratings内的评分总和。 For example, for Apple , the Rating threshold is between 4 and 7 (inclusive), so the Rating_Threshold_Sum would be 13+2+0+16 = 31.例如,对于Apple ,Rating 阈值介于 4 和 7(含)之间,因此Rating_Threshold_Sum将为 13+2+0+16 = 31。

So as a result, df_stats would have the Ratings_Threshold_Sum column:因此, df_stats将具有Ratings_Threshold_Sum列:

Fruit水果 Rating_Threshold_Low Rating_Threshold_Low Rating_Threshold_High Rating_Threshold_High Rating_Threshold_Sum Rating_Threshold_Sum
1 1 Apple苹果 4 4 7 7 31 31
2 2 Banana香蕉 5 5 9 9 32 32
3 3 Kiwi猕猴桃 6 6 8 8 26 26

I am not sure how to exactly do that, I know that I may have to use df.apply with a custom function, or looping through each row, but aside from that, I'm not sure the best way to tackle this problem.我不确定该怎么做,我知道我可能必须将 df.apply 与自定义 function 一起使用,或者遍历每一行,但除此之外,我不确定解决这个问题的最佳方法。 Any advice / direction would be much appreciated.任何建议/方向将不胜感激。 Thank you!谢谢!

You could do something like this你可以做这样的事情

sums = []
for i in range(len(df_stats)):
   min_v, max_v = df_stats["Rating_Threshold_Low"].values()[i], df_stats["Rating_Threshold_High"].values()[i]  
   values = []
   for z in range(min_v, max_v+1):
      x = df_ratings[str(z)][i]
      values.append(x)
   sums.append(sum(values))
df_stats["Rating_Threshold_Sum"] = sums

This is really complicated and there is probably a better way to do it but it should work.这真的很复杂,可能有更好的方法来做到这一点,但它应该可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM