简体   繁体   English

第三列中的值基于其他列的分组

[英]Value in a third column based on the grouped of other columns

I need to set a label for each id in column a, based on the existing values for this id. 我需要根据此ID的现有值为a列中的每个ID设置标签。 For example, if the id 1 only has "F" then the result will be "Female", if only "M" then "Male" and if mixed, then "Mixed". 例如,如果id 1仅具有“ F”,则结果将为“ Female”,如果仅“ M”,则结果为“ Male”,如果混合,则结果为“ Mixed”。

This is the dataframe base: 这是数据框的基础:

    df=data.frame(
      a=c(1,1,1,2,2,3,3,3,3,3),
      b=c("F","M","F","M","M","F","F","F","F","F"))

And this is the expected result: 这是预期的结果:

    df$Result=c("Mixed", "Mixed", "Mixed", "Male", "Male", "Female", "Female", "Female", "Female", "Female")

       a b Result
    1  1 F  Mixed
    2  1 M  Mixed
    3  1 F  Mixed
    4  2 M   Male
    5  2 M   Male
    6  3 F Female
    7  3 F Female
    8  3 F Female
    9  3 F Female
    10 3 F Female

Someone could please help me to calculate this df$Result column? 有人可以帮助我计算此df$Result列吗? Thanks in advance! 提前致谢!

After grouping by 'a', check the number of distinct elements in 'b'. 按“ a”分组后,检查“ b”中不同元素的数量。 If it is greater than 1 return "Mixed" or else return the changed label in 'b' 如果大于1,则返回“混合”,否则返回“ b”中更改的标签

library(dplyr)
df %>%
     mutate(b1 = c("Male", "Female")[(b == "F") + 1]) %>%
     group_by(a) %>%
     mutate(Result = case_when(n_distinct(b) > 1 ~ "Mixed", TRUE  ~ b1)) %>%
     select(-b1)
# A tibble: 10 x 3
# Groups:   a [3]
#       a b     Result
#   <dbl> <chr> <chr> 
# 1     1 F     Mixed 
# 2     1 M     Mixed 
# 3     1 F     Mixed 
# 4     2 M     Male  
# 5     2 M     Male  
# 6     3 F     Female
# 7     3 F     Female
# 8     3 F     Female
# 9     3 F     Female
#10     3 F     Female

data 数据

df <- data.frame(
      a=c(1,1,1,2,2,3,3,3,3,3),
      b=c("F","M","F","M","M","F","F","F","F","F"),
      stringsAsFactors = FALSE)

A solution with data.table : 解决方案与data.table

library(data.table)
a = c(1,1,1,2,2,3,3,3,3,3)
b = c("F","M","F","M","M","F","F","F","F","F")
df = data.table(a, b)

df[, result := as.character(uniqueN(b)), a]
df[, result := ifelse(result == "1", ifelse(b == "M", "Male", "Female"), "Mixed")]
df
#     a b result
#  1: 1 F  Mixed
#  2: 1 M  Mixed
#  3: 1 F  Mixed
#  4: 2 M   Male
#  5: 2 M   Male
#  6: 3 F Female
#  7: 3 F Female
#  8: 3 F Female
#  9: 3 F Female
# 10: 3 F Female

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM