简体   繁体   English

如何计算每行数据集中存在的唯一元素并在 r 中列出它们

[英]How to count the presence of unique elements from each row of data set and list them in r

I have a dataset like the following:我有一个如下数据集:

Age      Monday Tuesday Wednesday 
6-9        a     b        
6-9        b     a        c
6-9              c        a
9-10       c     c        b
9-10       c     a        b

Using R, I want to get the following data set/ results (where 1 represents the presence of the element and 0 represents the absence):使用 R,我想获得以下数据集/结果(其中 1 表示元素的存在,0 表示不存在):

Age        a     b        c
6-9        1     1        0
6-9        1     1        1
6-9        1     0        1
9-10       0     1        1
9-10       1     1        1

This can be done by melt then create the freq count by table这可以通过melt然后按table创建频率计数来完成

library(reshape2)
df['New']=row.names(df)
s=melt(df,c('Age','New'))
s=as.data.frame.matrix(table(s$New,s$value))
s$Age=df$Age
s
  a b c  Age
1 1 1 0  6-9
2 1 1 1  6-9
3 1 0 1  6-9
4 0 1 2 9-10
5 1 1 1 9-10

One option with pivot_longer and pivot_wider pivot_longerpivot_wider的一种选择

library(dplyr)
library(tidyr)
df1 %>%
   mutate(rn = row_number()) %>% 
   pivot_longer(cols = -c(Age, rn)) %>%
   filter(value != '')  %>% 
   select(-name) %>%
   distinct %>%
   mutate(val = 1) %>%
   pivot_wider(names_from = value, values_from = val, 
            values_fill = list(val = 0)) %>%
   select(-rn)
# A tibble: 5 x 4
#  Age       a     b     c
#  <chr> <dbl> <dbl> <dbl>
#1 6-9       1     1     0
#2 6-9       1     1     1
#3 6-9       1     0     1
#4 9-10      0     1     1
#5 9-10      1     1     1

data数据

df1 <- structure(list(Age = c("6-9", "6-9", "6-9", "9-10", "9-10"), 
    Monday = c("a", "b", "", "c", "c"), Tuesday = c("b", "a", 
    "c", "c", "a"), Wednesday = c("", "c", "a", "b", "b")),
    class = "data.frame", row.names = c(NA, 
-5L))

A data.table solution, using an ID variable;一个data.table解决方案,使用一个 ID 变量;

library(data.table)
library(magrittr)
df <- setDT(df)

ag = function(x){if(length(x>1)){1}else{length(x)}}

df[,idx:=.I][]%>%
  melt(id.vars = c("Age","idx")) %>%
  .[,.(Age,value,idx)]%>%
  dcast(Age+idx~value,fun.aggregate = ag)%>%
  .[,-c("idx","NA")]


    Age a b c
1:  6-9 1 1 0
2:  6-9 1 1 1
3:  6-9 1 0 1
4: 9-10 0 1 1
5: 9-10 1 1 1

The data:数据:

df <- read.table(text = "Age      Monday Tuesday Wednesday 
6-9        a     b        NA
6-9        b     a        c
6-9       NA     c        a
9-10       c     c        b
9-10       c     a        b",header = T)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM