简体   繁体   English

R 根据条件更改组中的最小值

[英]R Change smallest value in group based on condition

I would like to know how to change the smallest non-zero value in group if the count of a condition in the group is 1.如果组中条件的计数为 1,我想知道如何更改组中最小的非零值。

For example, given the data frame:例如,给定数据框:

df1 <- data.frame(x = unlist(map(letters[1:3], function(i) rep(i,4))),
                  y = rep('var',12),
                  z = c(c(10,0,'x',40), c(1,2,3,6),c(1,'x','x',6)))

df1

   x   y  z
1  a var 10
2  a var  0
3  a var  x
4  a var 40
5  b var  1
6  b var  2
7  b var  3
8  b var  6
9  c var  1
10 c var  x
11 c var  x
12 c var  6

I would like a[1,3] to change to x as there is only one "x" in the group a from col x, and the 10 is the smallest non-zero value in that group as to obtain the data frame:我希望a[1,3]更改为x ,因为 col x 的 a 组中只有一个“x”,而 10 是该组中获取数据帧的最小非零值:

  x   y  z
1  a var  x
2  a var  0
3  a var  x
4  a var 40
5  b var  1
6  b var  2
7  b var  3
8  b var  6
9  c var  1
10 c var  x
11 c var  x
12 c var  6

Thanks!谢谢!

We group by 'x', create a if/else condition by checking the count of 'x' values in 'z', if the count is 1, then replace the values in 'z' where the 'z' value is equal to the min of the numeric values (after the 0 is converted to NA - na_if ) to 'x'我们按'x'分组,通过检查'z'中'x'值的计数来创建if/else条件,如果计数为1,则replace 'z'中'z'值等于的值数值的min (在 0 转换为NA - na_if )到 'x'

library(dplyr)
library(stringr)
df1 %>% 
   group_by(x) %>% 
   mutate(z = if(sum(z == 'x') == 1) replace(z, 
       z == min(as.numeric(str_subset(na_if(z, '0'), '^[0-9.]+$')),
           na.rm = TRUE), 'x') else z) %>% 
   ungroup

-output -输出

# A tibble: 12 × 3
   x     y     z    
   <chr> <chr> <chr>
 1 a     var   x    
 2 a     var   0    
 3 a     var   x    
 4 a     var   40   
 5 b     var   1    
 6 b     var   2    
 7 b     var   3    
 8 b     var   6    
 9 c     var   1    
10 c     var   x    
11 c     var   x    
12 c     var   6    

I think akruns solution is better, but maybe just as an idea and because I like data.table more than dplyr:我认为 akruns 解决方案更好,但也许只是一个想法,因为我喜欢 data.table 多于 dplyr:

library(data.table)
df1 = data.table(df1)

for (i in unique(df1$x)) {
  if (length(df1[x==i & z=="x", z]) == 1){
    df1[x==i & z==min(df1[x==i & z!=0, z]), z:="x"]
  }
}

And the output:和 output:

 > df1
    x   y  z
 1: a var  x
 2: a var  0
 3: a var  x
 4: a var 40
 5: b var  1
 6: b var  2
 7: b var  3
 8: b var  6
 9: c var  1
10: c var  x
11: c var  x
12: c var  6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM