简体   繁体   English

在按计数汇总之前折叠列

[英]Collapse columns before aggregating by count

I have what I imagine is a very simple question, but I simply cannot figure it out after a good deal of board searching/reading tutorials. 我想的是一个非常简单的问题,但是经过大量的木板搜索/阅读教程后,我根本无法弄清楚。

I have a df with name entries in columns 5 through 12 that are all of one type. 我有一个df,第5到12列中的名称条目均为一种类型。 They are name strings (see below for example). 它们是名称字符串(例如,请参见下文)。 All I would like to do is use the aggregate or ddply (or another, if easier...) function to collapse those columns and then return the count of each unique entry. 我要做的就是使用聚合或ddply(或另一个,如果更简单...)函数折叠这些列,然后返回每个唯一条目的计数。

ID | Name 1 | Name 2 | Name 3 
Row 1: 278 | John | Tim | Mike
Row 2: 279 | Tim | Steve | John
Row 3: 280 | Tim | Doug | Dave 

So ideally I'd get: 因此,理想情况下,我会得到:

 Tim | 3 
 John | 2
 Mike | 1 
 etc. | 1 

I know how this works for one column: 我知道这对于一栏如何工作:

counts=aggregate(numeric(nrow(df)), df[c(4)], length)

But when I use a similar line for multiple columns, it returns the unique combinations of the seven columns, instead of an nx2 vector with the aggregated unique entries and total sums. 但是,当我在多列中使用相似的行时,它将返回七个列的唯一组合,而不是具有聚集的唯一条目和总和的nx2向量。

counts2=aggregate(numeric(nrow(df)),df[c(5:12)],FUN = function(x) length(unique(x)))

Thank you very much for your help. 非常感谢您的帮助。

Here is one way using dplyr and tidyr 这是使用dplyrtidyr一种方法

foo <- data.frame(id = 278:280,
                  Name1 = c("John", "Tim", "Mike"),
                  Name2 = c("Tim", "Steve", "John"),
                  Name3 = c("Tim", "Doug", "Dave"),
                  stringsAsFactors = FALSE)
library(dplyr)
library(tidyr)

foo %>%
    gather(var, names, -id) %>%
    count(names)

#  names n
#1  Dave 1
#2  Doug 1
#3  John 2
#4  Mike 1
#5 Steve 1
#6   Tim 3

I'm not as up to speed on the new packages that Hadley has come up with, but here's how I'd solve the problem using reshape2 package. 我没有赶上Hadley提出的软件包的速度,但是这是我使用reshape2软件包解决问题的方法。 The idea (same as above) is to collapse the columns into one column and then summarize that data: 想法(与上面相同)是将列折叠为一列,然后汇总该数据:

library(reshape2)

dcast(data = melt(foo, id.vars = "id"), value ~ .)
#---
  value .
1  Dave 1
2  Doug 1
3  John 2
4  Mike 1
5 Steve 1
6   Tim 3

Reading your data: 读取数据:

txt <- "ID | Name 1 | Name 2 | Name 3 
Row 1: 278 | John | Tim | Mike
Row 2: 279 | Tim | Steve | John
Row 3: 280 | Tim | Doug | Dave "
dat <- read.csv(text = txt, sep = "|", strip.white = TRUE)

You can use the as.data.frame table method on the unlisted columns. 您可以在未列出的列上使用as.data.frame表方法。

u <- unlist(dat[-1])
as.data.frame(table(levels(u)[u]))
#    Var1 Freq
# 1  Dave    1
# 2  Doug    1
# 3  John    2
# 4  Mike    1
# 5 Steve    1
# 6   Tim    3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM