简体   繁体   English

Pandas DataFrame:根据最小/最大列计算值

[英]Pandas DataFrame: Compute values based on column min/max

I have an NxN DataFrame with values I need to scale to a range of values that signify importance, where 0 is irrelevant and 3 is very important. 我有一个NxN DataFrame,其值需要缩放到表示重要性的值范围,其中0是不相关的,而3是非常重要的。

The formula I'm using to scale of course depends on the min and max values in each column, which are different for each column: Col A's range could be 1-12 while Col B's range could be 1M to 45M. 当然,我使用的公式取决于每列的最小值和最大值,每列的最小值和最大值不同:Col A的范围可能是1-12,而Col B的范围可能是1M至45M。

Here's the formula I'm using. 这是我正在使用的公式。

min_importance + ((max_importance - min_importance) / (max_spec_value - min_spec_value)) * (spec_value - min_spec_value)

How do I create a new DataFrame or dictionary with scaled values for each column, while retaining the index, which is needed later for identification? 如何在保留索引的同时创建每列具有缩放值的新DataFrame或字典,以后需要进行标识?

I tried creating a function with the above formula, and using apply() to call the function for each row, but I can't pass column min/max to the function, so that doesn't work. 我尝试使用上述公式创建一个函数,并使用apply()为每一行调用该函数,但是我无法将列的最小值/最大值传递给该函数,因此无法正常工作。

DataFrame sample ("Body: retail price" and "Body: sensor resolution" are columns): DataFrame示例(“机身:零售价”和“机身:传感器分辨率”为列):

Body: retail price  Body: sensor resolution  
Body name                                                            
Nikon D500                        2000.00                 20668416   
Nikon D7000                       1200.00                 16084992   
Sony Alpha 7R II                  3199.00                 42177408   
Canon EOS 5D Mark III             3499.00                 22118400   
Canon 7D Mark II                  1799.00                 19961856   
iPhone 6 (front)                   699.00                  1000000   
iPhone 6 (rear)                    699.00                  7990272   
Fujifilm X-T1                     1299.95                 15980544   
Fujifilm X-T2                     1599.00                 24000000

min-max normalization can be done with: 最小-最大归一化可以使用:

(df - df.min()) / (df.max() - df.min())
Out: 
                       Body: retail price  Body: sensor resolution
Body name                                                         
Nikon D500                       0.464643                 0.477651
Nikon D7000                      0.178929                 0.366341
Sony Alpha 7R II                 0.892857                 1.000000
Canon EOS 5D Mark III            1.000000                 0.512864
Canon 7D Mark II                 0.392857                 0.460492
iPhone 6 (front)                 0.000000                 0.000000
iPhone 6 (rear)                  0.000000                 0.169760
Fujifilm X-T1                    0.214625                 0.363805
Fujifilm X-T2                    0.321429                 0.558559

You don't need apply. 您不需要申请。 df.min() will return a series and when doing df - df.min() pandas will subtract corresponding column's minimum value from each value. df.min()将返回一个序列,并且在执行df - df.min()熊猫将从每个值中减去相应列的最小值。 This is called broadcasting which makes the task easier. 这称为广播,这使任务变得更容易。

If you have different importance levels for each column, best thing to do would be to store it in a dataframe: 如果每列的重要性级别不同,则最好的做法是将其存储在数据框中:

importances = pd.DataFrame({'max_imp': [1, 3], 'min_imp': [0, 0]}, index= df.columns)
importances
Out: 
                         max_imp  min_imp
Body: retail price             1        0
Body: sensor resolution        3        0

Now with the same principle, you can adjust your formula: 现在,以相同的原理,您可以调整公式:

importances['min_imp'] + ((importances['max_imp'] - importances['min_imp']) / (df.max() - df.min())) * (df - df.min())
Out: 
                       Body: retail price  Body: sensor resolution
Body name                                                         
Nikon D500                       0.464643                 1.432952
Nikon D7000                      0.178929                 1.099024
Sony Alpha 7R II                 0.892857                 3.000000
Canon EOS 5D Mark III            1.000000                 1.538591
Canon 7D Mark II                 0.392857                 1.381475
iPhone 6 (front)                 0.000000                 0.000000
iPhone 6 (rear)                  0.000000                 0.509280
Fujifilm X-T1                    0.214625                 1.091415
Fujifilm X-T2                    0.321429                 1.675676

Note that the index of importances and the columns of the actual dataframe should match. 请注意, importances索引和实际数据框的列应匹配。 In this example, the first column's range is converted to [0-1] and the second column's range to [0-3]. 在此示例中,第一列的范围转换为[0-1],第二列的范围转换为[0-3]。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM