Python Pandas - 根据要匹配到列名的另一个表的值切片 DataFrame

Question

I have two dataframes, df_stats and df_ratings.我有两个数据框，df_stats 和 df_ratings。

df_stats looks like this df_stats 看起来像这样

	Fruit水果	Rating_Threshold_Low Rating_Threshold_Low	Rating_Threshold_High Rating_Threshold_High
1 1	Apple苹果	4 4	7 7
2 2	Banana香蕉	5 5	9 9
3 3	Kiwi猕猴桃	6 6	8 8

df_ratings looks like this (the first column is the Fruit and each subsequent column represents a rating . df_ratings 看起来像这样（第一列是Fruit并且每个后续列代表一个rating 。

	Fruit水果	1 1	2 2	3 3	4 4	5 5	6 6	7 7	8 8	9 9	10 10
1 1	Apple苹果	2 2	4 4	7 7	13 13	2 2	0 0	16 16	1 1	9 9	22 22
2 2	Banana香蕉	6 6	4 4	2 2	1 1	8 8	7 7	5 5	3 3	9 9	0 0
3 3	Kiwi猕猴桃	21 21	4 4	3 3	6 6	8 8	9 9	9 9	8 8	7 7	5 5

What my goal is to get the sum of each the number of ratings within the rating threshold for each fruit (each fruit's rating threshold is different).我的目标是获得每个水果的评分阈值内的评分总数（每个水果的评分阈值不同）。 In other words, I want to add the column, Rating_Threshold_Sum in df_stats , which computes the sum of the ratings within the threshold df_ratings .换句话说，我想在 df_stats 添加列df_stats ，它计算阈值df_ratings内的评分总和。 For example, for Apple , the Rating threshold is between 4 and 7 (inclusive), so the Rating_Threshold_Sum would be 13+2+0+16 = 31.例如，对于Apple ，Rating 阈值介于 4 和 7（含）之间，因此Rating_Threshold_Sum将为 13+2+0+16 = 31。

So as a result, df_stats would have the Ratings_Threshold_Sum column:因此， df_stats将具有Ratings_Threshold_Sum列：

	Fruit水果	Rating_Threshold_Low Rating_Threshold_Low	Rating_Threshold_High Rating_Threshold_High	Rating_Threshold_Sum Rating_Threshold_Sum
1 1	Apple苹果	4 4	7 7	31 31
2 2	Banana香蕉	5 5	9 9	32 32
3 3	Kiwi猕猴桃	6 6	8 8	26 26

I am not sure how to exactly do that, I know that I may have to use df.apply with a custom function, or looping through each row, but aside from that, I'm not sure the best way to tackle this problem.我不确定该怎么做，我知道我可能必须将 df.apply 与自定义 function 一起使用，或者遍历每一行，但除此之外，我不确定解决这个问题的最佳方法。 Any advice / direction would be much appreciated.任何建议/方向将不胜感激。 Thank you!谢谢！

Answer 1

You could do something like this你可以做这样的事情

sums = []
for i in range(len(df_stats)):
   min_v, max_v = df_stats["Rating_Threshold_Low"].values()[i], df_stats["Rating_Threshold_High"].values()[i]  
   values = []
   for z in range(min_v, max_v+1):
      x = df_ratings[str(z)][i]
      values.append(x)
   sums.append(sum(values))
df_stats["Rating_Threshold_Sum"] = sums

This is really complicated and there is probably a better way to do it but it should work.这真的很复杂，可能有更好的方法来做到这一点，但它应该可以工作。

Python Pandas - 根据要匹配到列名的另一个表的值切片 DataFrame

问题描述

1 个解决方案

解决方案1
0 2021-01-26 22:05:54

Python Pandas - 根据要匹配到列名的另一个表的值切片 DataFrame

问题描述

1 个解决方案

解决方案1 0 2021-01-26 22:05:54

解决方案1
0 2021-01-26 22:05:54