简体   繁体   English

r 生成具有基于条件的值的列

[英]r generating a column with values based on a criteria

I have a dataset with 2 columns like this below我有一个包含 2 列的数据集,如下所示

   w     p
   0.5   0.5267
   0.5   0.5239
   1.0   0.5267
   1.0   0.5267
   1.0   0.5267
   0.5   0.3870
   0.5   0.3566
   1.0   0.4914
   1.0   0.4914  
   0.125 0.5267 
   0.125 0.5239 
   0.125 0.3870 
   0.125 0.3844 
   0.125 0.4942 
   0.125 0.4914 
   0.125 0.3566 
   0.125 0.3540 

I am trying to create a third column based on this criteria below我正在尝试根据以下标准创建第三列

Step1 : Start with Row 1 and check the value in Column w. 
        Row 1 column w is not 1  
Step2 : if the value in column w is not 1, then read the next value in column w. 
        Read the next column w value (Row 2)
Step3 : repeat step 2 until the sum of values from column w is 1.
        Column w row1 and row2 , 0.5 + 0.5 = 1
Step4 : Then read the corresponding values in column p.
        0.5267,  0.5239
Step5 : Multiply the values in column p with corresponding values in column w.
        0.5267*0.5 , 0.5239*0.5
Step6 : Add the values from Step 5
        0.5267*0.5 +  0.5239*0.5 
Step6 : Divide the values in column p with sum from step5.
        0.5267/(0.5267*0.5 +  0.5239*0.5) 
        0.5239/(0.5267*0.5 +  0.5239*0.5) 

The expected output is follows预期的output如下

   w     p        Result
   0.5   0.5267   0.5267/(0.5267*0.5 +  0.5239*0.5) 
   0.5   0.5239   0.5239/(0.5267*0.5 +  0.5239*0.5)
   1.0   0.5267   1
   1.0   0.5267   1
   1.0   0.5267   1 
   0.5   0.3870   0.3870/(0.3870*0.5 +  0.3566*0.5)
   0.5   0.3566   0.3566/(0.3870*0.5 +  0.3566*0.5)
   1.0   0.4914   1
   1.0   0.4914   1
   0.125 0.5267   0.5267/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.5239   0.5239/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.3870   0.3870/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.3844   0.3844/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.4942   0.4942/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.4914   0.4914/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.3566   0.3566/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.3540   0.3540/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)

I could do this using for loops ans ifelse statements, wondering if there is a more elegant way to accomplish this.我可以使用 for 循环和 ifelse 语句来做到这一点,想知道是否有更优雅的方式来实现这一点。 Thanks.谢谢。

We can create a group using cumulative sum of w values and calculate the result .我们可以使用w值的累积和创建一个组并计算result

library(dplyr)

df %>%
  group_by(gr = ceiling(cumsum(w))) %>%
  mutate(result = p/sum(w * p)) %>%
  ungroup() %>%
  select(-gr)


# A tibble: 17 x 3
#       w     p result
#   <dbl> <dbl>  <dbl>
# 1 0.5   0.527  1.00 
# 2 0.5   0.524  0.997
# 3 1     0.527  1    
# 4 1     0.527  1    
# 5 1     0.527  1    
# 6 0.5   0.387  1.04 
# 7 0.5   0.357  0.959
# 8 1     0.491  1    
# 9 1     0.491  1    
#10 0.125 0.527  1.20 
#11 0.125 0.524  1.19 
#12 0.125 0.387  0.880
#13 0.125 0.384  0.874
#14 0.125 0.494  1.12 
#15 0.125 0.491  1.12 
#16 0.125 0.357  0.811
#17 0.125 0.354  0.805

This can be done in data.table as:这可以在data.table中完成,如下所示:

library(data.table)
setDT(df)[, result := p/sum(w * p), ceiling(cumsum(w))]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM