简体   繁体   English

如何基于其他列按列获取新列中的值

[英]how to get values in new column based on other column group by columns

I am trying to get column desired_output which consists of values based on value column group by grp_1 & grp_2 我试图让列desired_output其中包括基于价值观的value由栏目组grp_1grp_2

ie if the values in value column having unique values then values should be NA 's 也就是说,如果value列中的value具有唯一值,则值应为NA

if values repeats more than any value then entire group will be that repeated value 如果值重复的次数超过任何值,那么整个组将是该重复值

if values repeats equal times then entire group will be that MAX number value 如果值重复相等的次数,则整个组将是该最大数字值

grp_1 = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A")
 grp_2 = c("a","a","a","a","a","b","b","b","b","c","c","c","c","d","d","d","d","e","e","e","e")
 value =c(1,2,3,3,4,1,2,3,4,1,1,2,2,1,2,4,4,1,3,3,3)
desired_output =c(3,3,3,3,3,NA,NA,NA,NA,2,2,2,2,4,4,4,4,3,3,3,3) 

 df = data.frame(grp_1,grp_2,value,desired_output)

I have been struck after getting repeated values count 获得重复的值计数后我被震惊了

func <- function(x) { 
  unlist(lapply(rle(x)$lengths, seq_len))

}  

df <- group_by(df,grp_1,grp_2)
df_1 <- mutate(df, common=as.numeric(func(value)) ) 

Thanks in advance 提前致谢

Hope this helps! 希望这可以帮助!

library(dplyr)
library(modeest)
final_df <- df %>%
  group_by(grp_1,grp_2) %>%
  mutate(desired_output = ifelse(n()==length(unique(value)),
                                 NA,
                                 ifelse(length(unique(table(value)))==1,
                                        max(value),
                                        mlv(value, method='mfv')[['M']]))) %>%
  data.frame()
final_df

Output is: 输出为:

   grp_1 grp_2 value desired_output
1      A     a     1              3
2      A     a     2              3
3      A     a     3              3
4      A     a     3              3
5      A     a     4              3
6      A     b     1             NA
7      A     b     2             NA
8      A     b     3             NA
9      A     b     4             NA
10     A     c     1              2
11     A     c     1              2
12     A     c     2              2
13     A     c     2              2
14     A     d     1              4
15     A     d     2              4
16     A     d     4              4
17     A     d     4              4
18     A     e     1              3
19     A     e     3              3
20     A     e     3              3
21     A     e     3              3


#sample data
structure(list(grp_1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor"), 
    grp_2 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
    3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c("a", 
    "b", "c", "d", "e"), class = "factor"), value = c(1, 2, 3, 
    3, 4, 1, 2, 3, 4, 1, 1, 2, 2, 1, 2, 4, 4, 1, 3, 3, 3)), .Names = c("grp_1", 
"grp_2", "value"), row.names = c(NA, -21L), class = "data.frame")

In case someone likes data.table 如果有人喜欢data.table

data.table::setDT(df)

df[,desired_outcome:= max(value[duplicated(value)]), by=c("grp_1","grp_2")
  ][is.infinite(desired_outcome),desired_outcome:=NA]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM