简体   繁体   中英

Count occurrences of a value on melted data using dplyr

I am trying to do a simple table after have melt my data, but using dplyr .

My data look like this

   cluster   21:30   21:45
4        c   alone   alone
6        b       %       %
12       e partner partner
14       b partner partner
20       b   alone   alone
22       c partner partner

With table I can simply

table(dta$cluster)
   a b c d e 
   2 8 5 1 4 

How can I get the same results using melt and summarise ?

 library(dplyr)
 library(reshape2)

 dta %>% 
 melt(id.vars = 'cluster')  %>% 
 group_by(cluster) %>% 
 summarise( n() ) 

What I need exactly is to table the clusters after having melt the data.

So to count correctly this data.frame

 dta %>% 
 melt(id.vars = 'cluster')

The expected output is this one

      cluster variable   value n_cluster
1        a    21:30       .         2
2        a    21:30 nuclear         2
3        a    21:45       .         2
4        a    21:45 nuclear         2
5        b    21:30       %         8
6        b    21:30 partner         8
7        b    21:30   alone         8
8        b    21:30 partner         8
9        b    21:30 partner         8
10       b    21:30 nuclear         8
11       b    21:30 partner         8
12       b    21:30 partner         8
13       b    21:45       %         8
14       b    21:45 partner         8
15       b    21:45   alone         8
16       b    21:45 partner         8
17       b    21:45 partner         8
18       b    21:45 nuclear         8
19       b    21:45 partner         8
20       b    21:45 partner         8
21       c    21:30   alone         5
22       c    21:30 partner         5
23       c    21:30       %         5
24       c    21:30 partner         5
25       c    21:30 partner         5
26       c    21:45   alone         5
27       c    21:45 partner         5
28       c    21:45       %         5
29       c    21:45 partner         5
30       c    21:45 partner         5
31       d    21:30 partner         1
32       d    21:45   alone         1
33       e    21:30 partner         4
34       e    21:30 nuclear         4
35       e    21:30 nuclear         4
36       e    21:30 nuclear         4
37       e    21:45 partner         4
38       e    21:45 nuclear         4
39       e    21:45 nuclear         4
40       e    21:45 nuclear         4

Any idea?

dta = structure(list(cluster = structure(c(3L, 2L, 5L, 2L, 2L, 3L, 
5L, 3L, 1L, 3L, 1L, 2L, 5L, 3L, 2L, 2L, 2L, 2L, 4L, 5L), .Label = c("a", 
"b", "c", "d", "e"), class = "factor"), `21:30` = structure(c(2L, 
7L, 5L, 5L, 2L, 5L, 4L, 7L, 1L, 5L, 4L, 5L, 4L, 5L, 5L, 4L, 5L, 
5L, 5L, 4L), .Label = c(".", "alone", "children", "nuclear", 
"partner", "*", "%"), class = "factor"), `21:45` = structure(c(2L, 
7L, 5L, 5L, 2L, 5L, 4L, 7L, 1L, 5L, 4L, 5L, 4L, 5L, 5L, 4L, 5L, 
5L, 2L, 4L), .Label = c(".", "alone", "children", "nuclear", 
"partner", "*", "%"), class = "factor")), .Names = c("cluster", 
"21:30", "21:45"), row.names = c("4", "6", "12", "14", "20", 
"22", "23", "28", "30", "32", "36", "38", "40", "42", "44", "48", 
"50", "56", "57", "60"), class = "data.frame")

I can't seem to find a good dupe for this, but a simple dplyr idiom will be just using count

count(dta, cluster)
# Source: local data frame [5 x 2]
# 
#   cluster n
# 1       a 2
# 2       b 8
# 3       c 5
# 4       d 1
# 5       e 4

Per your new desired output, you could just left join this result to your melted data set

dta %>% 
  melt(id.vars = 'cluster')  %>% 
  left_join(., count(dta, cluster)) %>%
  arrange(cluster)
#    cluster variable   value n
# 1        a    21:30       . 2
# 2        a    21:30 nuclear 2
# 3        a    21:45       . 2
# 4        a    21:45 nuclear 2
# 5        b    21:30       % 8
# 6        b    21:30 partner 8
# 7        b    21:30   alone 8
#...

When counting the distribution of a variable for repeated observations, one should take into account the number of observations.

In this example

n_episode = 2 

Then the code becomes straightforward

dta %>% 
  melt(id.vars = 'cluster')  %>% 
  group_by(cluster) %>% 
  mutate( n_cluster = n() / n_episode) %>% 
  arrange(cluster)

One can use this result ( n_episode ) in order to calculate mean for groups of different sizes for example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM