简体   繁体   English

基于另一列标记 id 在 R 中具有不同的值

[英]Flagging an id based on another column has different values in R

I have a flagging rule need to apply.我有一个需要应用的标记规则。

Here is how my dataset looks like:这是我的数据集的样子:

    df <- data.frame(id = c(1,1,1,1, 2,2,2,2, 3,3,3,3),
                 key = c("a","a","b","c", "a","b","c","d", "a","b","c","c"),
                 form = c("A","B","A","A", "A","A","A","A", "B","B","B","A"))
    
    > df
   id key form
1   1   a    A
2   1   a    B
3   1   b    A
4   1   c    A
5   2   a    A
6   2   b    A
7   2   c    A
8   2   d    A
9   3   a    B
10  3   b    B
11  3   c    B
12  3   c    A

I would like to flag id s based on a key columns that has duplicates, a third column of form shows different forms for each key .我想根据具有重复项的key列标记idform的第三列显示每个key的不同形式。 The idea is to understand if an id has taken any items from multiple forms.这个想法是要了解一个 id 是否从多个表单中获取了任何项目。 I need to add a filtering column as below:我需要添加一个过滤列,如下所示:

> df.1
   id key form type
1   1   a    A multiple
2   1   a    B multiple
3   1   b    A multiple 
4   1   c    A multiple
5   2   a    A single
6   2   b    A single
7   2   c    A single
8   2   d    A single
9   3   a    B multiple
10  3   b    B multiple
11  3   c    B multiple
12  3   c    A multiple

And eventually I need to get rid off the extra duplicated row which has different form .最终我需要摆脱具有不同form的额外重复行。 To decide which of the duplicated one drops, I pick whichever the form type has more items.为了决定哪一个重复的下降,我选择具有更多项目的form类型。

In a final separate dataset, I would like to have something like below:在最终的单独数据集中,我希望有如下内容:

  > df.2
       id key form type
    1   1   a    A multiple
    3   1   b    A multiple 
    4   1   c    A multiple
    5   2   a    A single
    6   2   b    A single
    7   2   c    A single
    8   2   d    A single
    9   3   a    B multiple
    10  3   b    B multiple
    11  3   c    B multiple

So first id has form A dominant so kept the A , and the third id has form B dominant so kept the B .所以第一个idform A占主导地位,所以保留了A ,第三个idform B占主导地位,所以保留了B

Any ideas?有任何想法吗? Thanks!谢谢!

We can check number of distinct elements to create the new column by group and then filter based on the highest frequency ( Mode )我们可以检查不同元素的数量以按组创建新列,然后根据最高频率( Mode )进行filter

library(dplyr)
df.2 <- df %>% 
  group_by(id) %>%
  mutate(type = if(n_distinct(form) > 1) 'multiple' else 'single') %>% 
  filter(form == Mode(form)) %>%
  ungroup

-output -输出

> df.2
# A tibble: 10 × 4
      id key   form  type    
   <dbl> <chr> <chr> <chr>   
 1     1 a     A     multiple
 2     1 b     A     multiple
 3     1 c     A     multiple
 4     2 a     A     single  
 5     2 b     A     single  
 6     2 c     A     single  
 7     2 d     A     single  
 8     3 a     B     multiple
 9     3 b     B     multiple
10     3 c     B     multiple

where在哪里

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 R 中具有相似列不同值时标记 id - Flagging an id when having similar columns different values in R 基于r中的另一个变量生成标记变量 - Generating a flagging variable based on another variable in r (R) 如何根据 R 中的另一列和 ID 从一列复制粘贴值 - (R) How to copy paste values from one column based on another column and ID in R 如何检查一列的唯一值是否多次出现在 R 中另一列的不同值? - How can I check if unique values of a column has multiple occurrences for different values of another column in R? 根据R中列的值创建另一列 - create another column based on the values on a column in R r - 将data.frame列中的值替换为基于同一列的唯一ID中的不同值 - r - Replace values in a data.frame column with a different value in the same column based unique ID 计算同一列中两个值之间的差异,ID基于R中另一列的值 - Calculating the difference between two values in the same column by ID based one the value of another column in R 根据R中另一列的条件,将NA添加到该列(已经具有NA和值)中 - Add NA to the column (that already has NA and values) based on a condition in another column in R 根据 R 中另一列中的 ID 分配一列中的 ID - Assign an ID in one column based on the ID in another column in R R:根据条件(不同大小的数据框)从另一个数据框的列中为列分配值 - R: Assign values to column, from a column from another data frame, based on a condition (different sized data frames)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM