[英]Add columns of frequency counts for each level of factor level and reshape dataframe
I have a dataframe like so: 我有一个像这样的数据框:
df<- data.frame(region = c("1","1","1","1","1","2","2"),
loc = c("104","104","104","105","105","106","107"),
plntsp = c("A","A", "B", "C", "C", "E", "F"),
lepsp = c("Z","Z", "Y", "W", "X", "T", "T"))
And I want to: 1) find the frequencies of plntsp
and lepsp
for each region
and loc
subset. 我想:1)找到每个
region
和loc
子集的plntsp
和lepsp
的频率。 2)make it a long dataframe where the plantsp
and lepsp
columns are collapsed into one column titled sp
. 2)使其成为一个长数据
plantsp
,其中plantsp
和lepsp
列折叠为一个标题为sp
列。 And the new count columns are collapsed into one count column called freq
. 然后将新的count列折叠为一个名为
freq
count列。
output<- data.frame(region = c("1","1","1","1","2","1","1","1","2","2","2"),
loc = c("104","104","105","106","107","104","104","105","105","106","107"),
sp = c("A","B", "C", "E", "F", "Z", "Y","W", "X", "T", "T"),
freq = c("2","1", "2", "1", "1", "2", "1", "1", "1", "1", "1"))
I have tried: 我努力了:
df<-
group_by(region,loc) %>%
summarise(freq1= length(unique(plantsp), freq2= length(unique(lepsp))
mutate(sp= df$plantsp &df$lepsp, freq= df$freq1 &df$freq2)
aggregate
would be one option aggregate
将是一种选择
rbind(aggregate(list(freq = seq_along(df$plntsp)),
by = list(region = df$region,loc = df$loc, sp = df$plntsp),
FUN = length),
aggregate(list(freq = seq_along(df$plntsp)),
by = list(region = df$region, loc = df$loc, sp = df$lepsp),
FUN = length))
# region loc sp freq
#1 1 104 A 2
#2 1 104 B 1
#3 1 105 C 2
#4 2 106 E 1
#5 2 107 F 1
#6 2 106 T 1
#7 2 107 T 1
#8 1 105 W 1
#9 1 105 X 1
#10 1 104 Y 1
#11 1 104 Z 2
Or use melt
of reshape2
before using aggregate
或在使用
aggregate
之前使用reshape2
melt
library(reshape2)
opt = melt(data = df, id.vars = c("region", "loc"))
#Warning message:
#attributes are not identical across measure variables; they will be dropped
aggregate(list(freq=opt$value), opt[c("region","loc","value")], FUN = length)
# region loc value freq
#1 1 104 A 2
#2 1 104 B 1
#3 1 105 C 2
#4 2 106 E 1
#5 2 107 F 1
#6 2 106 T 1
#7 2 107 T 1
#8 1 105 W 1
#9 1 105 X 1
#10 1 104 Y 1
#11 1 104 Z 2
Using tidyverse
: 使用
tidyverse
:
library(tidyverse)
df %>%
gather(key, sp, plntsp, lepsp) %>%
group_by(region, loc, sp) %>%
count(.) %>%
rename(x=n)
region loc sp x
1 1 104 A 2
2 1 104 B 1
3 1 104 Y 1
4 1 104 Z 2
5 1 105 C 2
6 1 105 W 1
7 1 105 X 1
8 2 106 E 1
9 2 106 T 1
10 2 107 F 1
11 2 107 T 1
This data.table
solution follows the advice from thelatemail to reshape to long format first and then to count appearances. 此
data.table
解决方案遵循data.table
的建议,先将其重塑为长格式,然后计算外观。
The melt()
function to reshape data from wide to long format is available from two packages: reshape2
and data.table
. 从两个包中可以使用
melt()
函数来将数据从宽格式更改为长格式: reshape2
和data.table
。 I prefer the latter for performance reasons and the concise syntax: 由于性能原因和简洁的语法,我更喜欢后者:
library(data.table)
id_vars = c("region", "loc")
melt(setDT(df), id.vars = id_vars, value.name = "sp")[, .(freq = .N), c(id_vars, "sp")]
region loc sp freq 1: 1 104 A 2 2: 1 104 B 1 3: 1 105 C 2 4: 2 106 E 1 5: 2 107 F 1 6: 1 104 Z 2 7: 1 104 Y 1 8: 1 105 W 1 9: 1 105 X 1 10: 2 106 T 1 11: 2 107 T 1
Note that the columns have been renamed as requested by the OP. 请注意,这些列已按照OP的要求进行了重命名。 For comparison with the other answers posted the far, the code is even more condensed without renaming the columns:
为了与到目前为止发布的其他答案进行比较,在不重命名列的情况下,代码变得更加简洁:
melt(setDT(df), id.vars = id_vars)[, .N, c(id_vars, "value")]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.