简体   繁体   English

根据 R 中另一列数据框的组查找列的共同值

[英]find common values of a column based on group of another a column of data frame in R

I have data frame like this:我有这样的数据框:

df<-tibble(id=c("ls1","ls1","ls1","ls2","ls2","ls3","ls5","ls5","ls10","ls10","ls14"),
               target=c("A","A","B","G","H","A","B","B","G","HA","B"))

I would like to have a list of common values from target column within groups of id and also between groups of id variable.我想在 id 组中以及在 id 变量组之间的目标列中列出常见值。 The result can look like something like below table:结果可能如下表所示:

res<-tibble(id=c("ls1","ls1","ls1","ls2","ls2","ls3","ls5","ls5","ls10","ls10","ls14"),
            target=c("A","A","B","G","H","A","B","B","G","HA","B"),
            withinGroup=c(T,T,F,F,F,F,F,T,T,F,F),
            numberofRepwithinGroup=c(2,2,1,1,1,1,1,2,2,1,1),
            betweenGroups=c(T,T,T,T,F,T,T,T,T,F,T),
            numberofRepbetweenGroups=c(2,2,3,2,0,3,3,3,2,0,3))

Any idea how to do it?知道怎么做吗?

You can do it with a couple of mutate() :你可以用几个mutate()来做到这一点:

library(dplyr)

df |>
  # first group by
  group_by(id, target) |>
  # add the within columns
  mutate(numberofRepwithinGroup = length(target),
         withinGroup            = ifelse(numberofRepwithinGroup > 1,T,F)) |> 
  # second group by
  group_by(target) |>
  # add the between columns
  mutate(numberofRepbetweenGroups = ifelse(n_distinct(id) == 1, 0, n_distinct(id)),
         betweenGroups            = ifelse(numberofRepbetweenGroups  > 0,T,F)) |>
   # reorder columns
  select(id,target, withinGroup, numberofRepwithinGroup, betweenGroups, numberofRepbetweenGroups
  ) |> 
  # remove useless grouping
  ungroup()

# A tibble: 11 x 6
   id    target withinGroup numberofRepwithinGroup betweenGroups numberofRepbetweenGroups
   <chr> <chr>  <lgl>                        <int> <lgl>                            <dbl>
 1 ls1   A      TRUE                             2 TRUE                                 2
 2 ls1   A      TRUE                             2 TRUE                                 2
 3 ls1   B      FALSE                            1 TRUE                                 3
 4 ls2   G      FALSE                            1 TRUE                                 2
 5 ls2   H      FALSE                            1 FALSE                                0
 6 ls3   A      FALSE                            1 TRUE                                 2
 7 ls5   B      TRUE                             2 TRUE                                 3
 8 ls5   B      TRUE                             2 TRUE                                 3
 9 ls10  G      FALSE                            1 TRUE                                 2
10 ls10  HA     FALSE                            1 FALSE                                0
11 ls14  B      FALSE                            1 TRUE                                 3

Here is an option这是一个选项

library(dplyr)
get_reps <- function(x) as.numeric(table(x)[match(x, names(table(x)))] - 1)
df %>%
    group_by(id) %>%
    mutate(
        withinGroup = duplicated(target) | duplicated(target, fromLast = T),
        numberofRepwithinGroup = get_reps(target)) %>%
    ungroup() %>%
    mutate(
        betweenGroups = duplicated(target) | duplicated(target, fromLast = T),
        numberofRepbetweenGroups = get_reps(target))
## A tibble: 11 x 6
#   id    target withinGroup numberofRepwithinGroup betweenGroups numberofRepbetweenGroups
#   <chr> <chr>  <lgl>                        <dbl> <lgl>                            <dbl>
# 1 ls1   A      TRUE                             1 TRUE                                 2
# 2 ls1   A      TRUE                             1 TRUE                                 2
# 3 ls1   B      FALSE                            0 TRUE                                 3
# 4 ls2   G      FALSE                            0 TRUE                                 1
# 5 ls2   H      FALSE                            0 FALSE                                0
# 6 ls3   A      FALSE                            0 TRUE                                 2
# 7 ls5   B      TRUE                             1 TRUE                                 3
# 8 ls5   B      TRUE                             1 TRUE                                 3
# 9 ls10  G      FALSE                            0 TRUE                                 1
#10 ls10  HA     FALSE                            0 FALSE                                0
#11 ls14  B      FALSE                            0 TRUE                                 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据R中另一列的值乘以数据框列的值 - Multiplying data frame column values based on the value of another column in R R:根据另一列操作一个数据框列的值 - R: Manipulate values of one data frame column based on another column R-根据一列中跨不同列的公共值,将data.frame格式化为另一个“组合” data.frame - R- format a data.frame into another 'combined' data.frame based on common values within a column dependent across different columns R 故障排除:根据数据框中另一列中的值对数据框中的一列的值求和 - R Troubleshooting: Sum values of one column in a data frame based on values in another column of the data frame R根据另一个数据框的精确匹配替换列的值 - R replace values of a column based on exact match of another data frame 根据R中的另一个数据帧更新列值 - updating column values based on another data frame in R 根据数据框R中的另一列从另一列中提取一列的值 - Extract values for a column from another column based on another column in data frame R R 中的数据框按列中的重复值分组 - Data frame in R group by duplicated values in column R 基于列值聚合数据框 - R Aggregate data frame based on column values R:根据条件(不同大小的数据框)从另一个数据框的列中为列分配值 - R: Assign values to column, from a column from another data frame, based on a condition (different sized data frames)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM