[英]how to get values in new column based on other column group by columns
I am trying to get column desired_output
which consists of values based on value
column group by grp_1
& grp_2
我试图让列
desired_output
其中包括基于价值观的value
由栏目组grp_1
& grp_2
ie if the values in value
column having unique values then values should be NA
's 也就是说,如果
value
列中的value
具有唯一值,则值应为NA
if values repeats more than any value then entire group will be that repeated value 如果值重复的次数超过任何值,那么整个组将是该重复值
if values repeats equal times then entire group will be that MAX number value 如果值重复相等的次数,则整个组将是该最大数字值
grp_1 = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A")
grp_2 = c("a","a","a","a","a","b","b","b","b","c","c","c","c","d","d","d","d","e","e","e","e")
value =c(1,2,3,3,4,1,2,3,4,1,1,2,2,1,2,4,4,1,3,3,3)
desired_output =c(3,3,3,3,3,NA,NA,NA,NA,2,2,2,2,4,4,4,4,3,3,3,3)
df = data.frame(grp_1,grp_2,value,desired_output)
I have been struck after getting repeated values count 获得重复的值计数后我被震惊了
func <- function(x) {
unlist(lapply(rle(x)$lengths, seq_len))
}
df <- group_by(df,grp_1,grp_2)
df_1 <- mutate(df, common=as.numeric(func(value)) )
Thanks in advance 提前致谢
Hope this helps! 希望这可以帮助!
library(dplyr)
library(modeest)
final_df <- df %>%
group_by(grp_1,grp_2) %>%
mutate(desired_output = ifelse(n()==length(unique(value)),
NA,
ifelse(length(unique(table(value)))==1,
max(value),
mlv(value, method='mfv')[['M']]))) %>%
data.frame()
final_df
Output is: 输出为:
grp_1 grp_2 value desired_output
1 A a 1 3
2 A a 2 3
3 A a 3 3
4 A a 3 3
5 A a 4 3
6 A b 1 NA
7 A b 2 NA
8 A b 3 NA
9 A b 4 NA
10 A c 1 2
11 A c 1 2
12 A c 2 2
13 A c 2 2
14 A d 1 4
15 A d 2 4
16 A d 4 4
17 A d 4 4
18 A e 1 3
19 A e 3 3
20 A e 3 3
21 A e 3 3
#sample data
structure(list(grp_1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor"),
grp_2 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c("a",
"b", "c", "d", "e"), class = "factor"), value = c(1, 2, 3,
3, 4, 1, 2, 3, 4, 1, 1, 2, 2, 1, 2, 4, 4, 1, 3, 3, 3)), .Names = c("grp_1",
"grp_2", "value"), row.names = c(NA, -21L), class = "data.frame")
In case someone likes data.table
如果有人喜欢
data.table
data.table::setDT(df)
df[,desired_outcome:= max(value[duplicated(value)]), by=c("grp_1","grp_2")
][is.infinite(desired_outcome),desired_outcome:=NA]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.