[英]Aggregating frequency of column entries into separate columns in R
I have a data frame that looks like: 我有一个数据框,看起来像:
POP<-c(rep("POP1",6), rep("POP2",6), rep("POP3", 6))
IID<-c(rep("POP1_1", 2), rep("POP1_2",2), rep("POP1_3", 2), rep("POP2_1",2), rep("POP2_2",2), rep("POP2_3",2), rep("POP3_1",2), rep("POP3_2",2),rep("POP3_3",2))
Site1<-c(36, 42, 32, 32, 48, 42, 36, 36, 48, 42, 36, 48, 28, 32, 32, 32, 48, 32)
Site2<-c(10, 8, 10, 16, 16, 10, 10, 10, 16, 10, -9, -9, 16, 8, 10, 10, 8, 8)
dat<-cbind(POP, IID, Site1, Site2)
With many more columns of sites, and many more POP groups. 有更多的网站列和更多的POP组。 I want to go through by columns and for each different entry in the column I want a new column to contain the frequency of that entry, aggregated over the POP column. 我想按列浏览,对于该列中的每个不同条目,我都希望一个新列包含该条目的频率,并在POP列中汇总。 -9 denotes missing values. -9表示缺失值。 I do not want these to constitute a column, or to contribute to the frequency. 我不希望这些内容构成专栏或对频率有所贡献。
Ultimately, the data above would look like: 最终,以上数据将如下所示:
dat
POP Site1_28 Site1_32 Site1_36 Site1_42 Site1_48 Site2_8 Site2_10 Site2_16
POP1 0 0.333 0.167 0.333 0.166 0.167 0.5 0.333
POP2 0 0 0.5 0.167 0.333 0 0.75 0.25
POP3 0.167 0.667 0 0 0.167 0.5 0.333 0.167
I'm guessing I'll be looking at lapply() over some use of table() and aggregate(), but I really have no idea where to start. 我猜我会在使用table()和aggregate()的同时查看lapply(),但是我真的不知道从哪里开始。
Thank you! 谢谢!
I think this should do what you want. 我认为这应该做您想要的。 First, we do some data manipulation to make our call to table
work. 首先,我们进行一些数据操作以使对table
的调用有效。 Then, we iterate over the two columns, performing a prop.table
for the sites by each POP
value. 然后,我们遍历两列,通过每个POP
值为站点执行一个prop.table
。 Finally, we use rbind
and cbind
to combine the data. 最后,我们使用rbind
和cbind
合并数据。
#create data.frame
dat<-data.frame(POP, IID, Site1, Site2,
stringsAsFactors = FALSE)
#identify columns containing 'Site'
site_col_names <- names(dat)[grep(pattern = 'Site', x = names(dat))]
#for each site column, recode -9 as NA, and then paste
for(i in site_col_names){
dat[i] <- factor(sapply(dat[i], function(x)
ifelse(x == -9, NA, paste0(i,'_',x))))
}
#iterate over columns, calculate prop.table
do.call('cbind',
lapply(site_col_names, function(n){
do.call('rbind',
by(dat, dat$POP, function(d) prop.table(table(d[n]))))
}))
Site1_28 Site1_32 Site1_36 Site1_42 Site1_48 Site2_10 Site2_16 Site2_8
POP1 0.0000000 0.3333333 0.1666667 0.3333333 0.1666667 0.5000000 0.3333333 0.1666667
POP2 0.0000000 0.0000000 0.5000000 0.1666667 0.3333333 0.7500000 0.2500000 0.0000000
POP3 0.1666667 0.6666667 0.0000000 0.0000000 0.1666667 0.3333333 0.1666667 0.5000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.