在按计数汇总之前折叠列

Question

I have what I imagine is a very simple question, but I simply cannot figure it out after a good deal of board searching/reading tutorials. 我想的是一个非常简单的问题，但是经过大量的木板搜索/阅读教程后，我根本无法弄清楚。

I have a df with name entries in columns 5 through 12 that are all of one type. 我有一个df，第5到12列中的名称条目均为一种类型。 They are name strings (see below for example). 它们是名称字符串（例如，请参见下文）。 All I would like to do is use the aggregate or ddply (or another, if easier...) function to collapse those columns and then return the count of each unique entry. 我要做的就是使用聚合或ddply（或另一个，如果更简单...）函数折叠这些列，然后返回每个唯一条目的计数。

ID | Name 1 | Name 2 | Name 3 
Row 1: 278 | John | Tim | Mike
Row 2: 279 | Tim | Steve | John
Row 3: 280 | Tim | Doug | Dave

So ideally I'd get: 因此，理想情况下，我会得到：

 Tim | 3 
 John | 2
 Mike | 1 
 etc. | 1

I know how this works for one column: 我知道这对于一栏如何工作：

counts=aggregate(numeric(nrow(df)), df[c(4)], length)

But when I use a similar line for multiple columns, it returns the unique combinations of the seven columns, instead of an nx2 vector with the aggregated unique entries and total sums. 但是，当我在多列中使用相似的行时，它将返回七个列的唯一组合，而不是具有聚集的唯一条目和总和的nx2向量。

counts2=aggregate(numeric(nrow(df)),df[c(5:12)],FUN = function(x) length(unique(x)))

Thank you very much for your help. 非常感谢您的帮助。

Answer 1

Here is one way using dplyr and tidyr 这是使用dplyr和tidyr一种方法

foo <- data.frame(id = 278:280,
                  Name1 = c("John", "Tim", "Mike"),
                  Name2 = c("Tim", "Steve", "John"),
                  Name3 = c("Tim", "Doug", "Dave"),
                  stringsAsFactors = FALSE)
library(dplyr)
library(tidyr)

foo %>%
    gather(var, names, -id) %>%
    count(names)

#  names n
#1  Dave 1
#2  Doug 1
#3  John 2
#4  Mike 1
#5 Steve 1
#6   Tim 3

Answer 2

I'm not as up to speed on the new packages that Hadley has come up with, but here's how I'd solve the problem using reshape2 package. 我没有赶上Hadley提出的新软件包的速度，但是这是我使用reshape2软件包解决问题的方法。 The idea (same as above) is to collapse the columns into one column and then summarize that data: 想法（与上面相同）是将列折叠为一列，然后汇总该数据：

library(reshape2)

dcast(data = melt(foo, id.vars = "id"), value ~ .)
#---
  value .
1  Dave 1
2  Doug 1
3  John 2
4  Mike 1
5 Steve 1
6   Tim 3

Answer 3

Reading your data: 读取数据：

txt <- "ID | Name 1 | Name 2 | Name 3 
Row 1: 278 | John | Tim | Mike
Row 2: 279 | Tim | Steve | John
Row 3: 280 | Tim | Doug | Dave "
dat <- read.csv(text = txt, sep = "|", strip.white = TRUE)

You can use the as.data.frame table method on the unlisted columns. 您可以在未列出的列上使用as.data.frame表方法。

u <- unlist(dat[-1])
as.data.frame(table(levels(u)[u]))
#    Var1 Freq
# 1  Dave    1
# 2  Doug    1
# 3  John    2
# 4  Mike    1
# 5 Steve    1
# 6   Tim    3

在按计数汇总之前折叠列

问题描述

3 个解决方案

解决方案1
3 2014-11-02 15:59:05

解决方案2
2 已采纳 2014-11-02 16:32:11

解决方案3
1 2014-11-02 16:54:47

在按计数汇总之前折叠列

问题描述

3 个解决方案

解决方案1 3 2014-11-02 15:59:05

解决方案2 2 已采纳 2014-11-02 16:32:11

解决方案3 1 2014-11-02 16:54:47

解决方案1
3 2014-11-02 15:59:05

解决方案2
2 已采纳 2014-11-02 16:32:11

解决方案3
1 2014-11-02 16:54:47