简体   繁体   English

如何根据 2 列的条件改变 R dyplyr 中的新变量?

[英]How to mutate a new variable in R dyplyr that is based on criteria from 2 columns?

I have a dataset which looks like this:我有一个看起来像这样的数据集:

Recipient  ID
(chr)       (chr)  
Smith       C
Wells       S
Wells       S
Jones       S
Jones       N
Wu          C
Wu          N
Wu          S

I want to mutate a new variable, which is either "Unique" or "Multiple", based on if Recipient appears once (Unique), Recipient appears more than once but has the same ID for each occurence (Unique), Recipient appears more than once AND has 1 or more IDs (Multiple).我想改变一个新变量,它是“唯一”或“多个”,基于收件人是否出现一次(唯一),收件人出现不止一次但每次出现都具有相同的 ID(唯一),收件人出现多次一旦 AND 有 1 个或多个 ID(多个)。 I've tried to use:我试过使用:

df %>%
 group_by(Recipient, ID) %>%
 mutuate(Freq = case_when(
                str_count(Recipient) == 1 & str_count(ID) == 1 ~ "Unique",
                str_count(Recipient) > 2 & str_count(ID) == 1 ~ "Unique",
                str_count(Recipient) > 2 & str_count(ID) > 1 ~ "Multiple"))

When I did this, all the values were multiple:当我这样做时,所有的值都是多个:

Recipient  ID     Freq
(chr)      (chr)  (chr)
Smith       C     Multiple (should be Unique)
Wells       S     Multiple (should be Unique)
Wells       S     Multiple (should be Unique)
Jones       S     Multiple
Jones       N     Multiple
Wu          C     Multiple
Wu          N     Multiple
Wu          S     Multiple

I've tried multiple times, but can't crack it.我已经尝试了很多次,但无法破解它。 Can anyone help to solve this, or recommend an easier way to code this?任何人都可以帮助解决这个问题,或者推荐一种更简单的编码方法吗? Thanks!谢谢!

A possible solution with n_distinct() : n_distinct()的可能解决方案:

library(dplyr)

df %>%
  group_by(Recipient) %>%
  mutate(Freq = ifelse(n_distinct(ID) == 1, "unique", "multiple")) %>%
  ungroup()

# A tibble: 8 x 3
  Recipient ID    Freq
  <chr>     <chr> <chr>
1 Smith     C     unique
2 Wells     S     unique
3 Wells     S     unique
4 Jones     S     multiple
5 Jones     N     multiple
6 Wu        C     multiple
7 Wu        N     multiple
8 Wu        S     multiple

Data数据
df <- structure(list(Recipient = c("Smith", "Wells", "Wells", "Jones", 
"Jones", "Wu", "Wu", "Wu"), ID = c("C", "S", "S", "S", "N", "C",
"N", "S")), class = "data.frame", row.names = c(NA, -8L))

Here is the update after clarification:这是澄清后的更新:

library(dplyr)

df %>% 
  group_by(Recipient) %>% 
  mutate(Freq = paste(Recipient, ID),
         Freq = ifelse(Freq %in% Freq[duplicated(Freq)], "unique", "multiple"),
         Freq = ifelse(Recipient %in% Recipient[duplicated(Recipient)], Freq, "unique"))
  Recipient ID    Freq    
  <chr>     <chr> <chr>   
1 Smith     C     unique  
2 Wells     S     unique  
3 Wells     S     unique  
4 Jones     S     multiple
5 Jones     N     multiple
6 Wu        C     multiple
7 Wu        N     multiple
8 Wu        S     multiple

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据在 R 中命名为数字的列来改变 tibble 中的变量 - How to mutate a variable in a tibble based on columns named as numbers in R R. 根据条件改变新变量 - R. mutate new variable based on conditions 根据 R 中的多个列中是否存在一组字符串来变异新变量 - Mutate new variable based on whether a set of strings is present in multiple columns in R 如何使用 R 中的 function 变异基于数据帧中的另一个二进制变量创建新的二进制变量? - How to create a new binary variable based on another binary variable in a data frame with the function mutate in R? 如何根据其他变量的最早和最晚日期改变 R 中的新列 - How to mutate new columns in R based on earliest and latest dates for other variables dplyr:根据变量字符串选择的多个列来更改新列 - dplyr: mutate new column based on multiple columns selected by variable string r dataset 根据列表中的值改变新变量 - r dataset mutate new variable based on values in a list R 根据以前的列改变新列并动态命名它们 - R mutate new columns based on previous ones and dynamically name them 如何根据这些列子集的排序来改变 R 中的列? - How to mutate columns in R based on ordering of subset of these columns? R dplyr / tidyr:使用其他观测值的数据“突变”新列 - R dplyr/tidyr: “mutate” new columns with data from other observations
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM