为每个因子水平级别添加频率计数列，并调整数据框的形状

Question

I have a dataframe like so: 我有一个像这样的数据框：

 df<- data.frame(region = c("1","1","1","1","1","2","2"),
            loc = c("104","104","104","105","105","106","107"), 
            plntsp = c("A","A", "B", "C", "C", "E", "F"), 
            lepsp = c("Z","Z", "Y", "W", "X", "T", "T"))

And I want to: 1) find the frequencies of plntsp and lepsp for each region and loc subset. 我想：1）找到每个region和loc子集的plntsp和lepsp的频率。 2)make it a long dataframe where the plantsp and lepsp columns are collapsed into one column titled sp . 2）使其成为一个长数据plantsp ，其中plantsp和lepsp列折叠为一个标题为sp列。 And the new count columns are collapsed into one count column called freq . 然后将新的count列折叠为一个名为freq count列。

output<- data.frame(region = c("1","1","1","1","2","1","1","1","2","2","2"),
loc = c("104","104","105","106","107","104","104","105","105","106","107"), 
sp = c("A","B", "C", "E", "F", "Z", "Y","W", "X", "T", "T"), 
freq = c("2","1", "2", "1", "1", "2", "1", "1", "1", "1", "1"))

I have tried: 我努力了：

df<- 
group_by(region,loc) %>%
summarise(freq1= length(unique(plantsp), freq2= length(unique(lepsp))
mutate(sp= df$plantsp &df$lepsp, freq= df$freq1 &df$freq2)

Answer 1

aggregate would be one option aggregate将是一种选择

rbind(aggregate(list(freq = seq_along(df$plntsp)),
                by = list(region = df$region,loc = df$loc, sp = df$plntsp),
                FUN = length),
      aggregate(list(freq = seq_along(df$plntsp)),
                by = list(region = df$region, loc = df$loc, sp = df$lepsp),
                FUN = length))
#   region loc sp freq
#1       1 104  A    2
#2       1 104  B    1
#3       1 105  C    2
#4       2 106  E    1
#5       2 107  F    1
#6       2 106  T    1
#7       2 107  T    1
#8       1 105  W    1
#9       1 105  X    1
#10      1 104  Y    1
#11      1 104  Z    2

Or use melt of reshape2 before using aggregate 或在使用aggregate之前使用reshape2 melt

library(reshape2)
opt = melt(data = df, id.vars = c("region", "loc"))
#Warning message:
#attributes are not identical across measure variables; they will be dropped 
aggregate(list(freq=opt$value), opt[c("region","loc","value")], FUN = length)
#   region loc value freq
#1       1 104     A    2
#2       1 104     B    1
#3       1 105     C    2
#4       2 106     E    1
#5       2 107     F    1
#6       2 106     T    1
#7       2 107     T    1
#8       1 105     W    1
#9       1 105     X    1
#10      1 104     Y    1
#11      1 104     Z    2

Answer 2

Using tidyverse : 使用tidyverse ：

library(tidyverse)
df %>% 
  gather(key, sp, plntsp, lepsp) %>%
  group_by(region, loc, sp) %>%
  count(.) %>%
  rename(x=n)

   region    loc    sp     x
 1      1    104     A     2
 2      1    104     B     1
 3      1    104     Y     1
 4      1    104     Z     2
 5      1    105     C     2
 6      1    105     W     1
 7      1    105     X     1
 8      2    106     E     1
 9      2    106     T     1
10      2    107     F     1
11      2    107     T     1

Answer 3

This data.table solution follows the advice from thelatemail to reshape to long format first and then to count appearances. 此data.table解决方案遵循data.table 的建议，先将其重塑为长格式，然后计算外观。

The melt() function to reshape data from wide to long format is available from two packages: reshape2 and data.table . 从两个包中可以使用melt()函数来将数据从宽格式更改为长格式： reshape2和data.table 。 I prefer the latter for performance reasons and the concise syntax: 由于性能原因和简洁的语法，我更喜欢后者：

library(data.table)
id_vars = c("region", "loc")
melt(setDT(df), id.vars = id_vars, value.name = "sp")[, .(freq = .N), c(id_vars, "sp")]

  region loc sp freq 1: 1 104 A 2 2: 1 104 B 1 3: 1 105 C 2 4: 2 106 E 1 5: 2 107 F 1 6: 1 104 Z 2 7: 1 104 Y 1 8: 1 105 W 1 9: 1 105 X 1 10: 2 106 T 1 11: 2 107 T 1

Note that the columns have been renamed as requested by the OP. 请注意，这些列已按照OP的要求进行了重命名。 For comparison with the other answers posted the far, the code is even more condensed without renaming the columns: 为了与到目前为止发布的其他答案进行比较，在不重命名列的情况下，代码变得更加简洁：

melt(setDT(df), id.vars = id_vars)[, .N, c(id_vars, "value")]

为每个因子水平级别添加频率计数列，并调整数据框的形状

问题描述

3 个解决方案

解决方案1
1 已采纳 2017-07-09 22:28:53

解决方案2
1 2017-07-09 23:20:52

解决方案3
0 2017-07-10 06:43:14

为每个因子水平级别添加频率计数列，并调整数据框的形状

问题描述

3 个解决方案

解决方案1 1 已采纳 2017-07-09 22:28:53

解决方案2 1 2017-07-09 23:20:52

解决方案3 0 2017-07-10 06:43:14

解决方案1
1 已采纳 2017-07-09 22:28:53

解决方案2
1 2017-07-09 23:20:52

解决方案3
0 2017-07-10 06:43:14