简体   繁体   English

将列条目的频率汇总到R中的单独列中

[英]Aggregating frequency of column entries into separate columns in R

I have a data frame that looks like: 我有一个数据框,看起来像:

POP<-c(rep("POP1",6), rep("POP2",6), rep("POP3", 6))
IID<-c(rep("POP1_1", 2), rep("POP1_2",2), rep("POP1_3", 2), rep("POP2_1",2), rep("POP2_2",2), rep("POP2_3",2), rep("POP3_1",2), rep("POP3_2",2),rep("POP3_3",2))
Site1<-c(36, 42, 32, 32, 48, 42, 36, 36, 48, 42, 36, 48, 28, 32, 32, 32, 48, 32)
Site2<-c(10, 8, 10, 16, 16, 10, 10, 10, 16, 10, -9, -9, 16, 8, 10, 10, 8, 8)
dat<-cbind(POP, IID, Site1, Site2)

With many more columns of sites, and many more POP groups. 有更多的网站列和更多的POP组。 I want to go through by columns and for each different entry in the column I want a new column to contain the frequency of that entry, aggregated over the POP column. 我想按列浏览,对于该列中的每个不同条目,我都希望一个新列包含该条目的频率,并在POP列中汇总。 -9 denotes missing values. -9表示缺失值。 I do not want these to constitute a column, or to contribute to the frequency. 我不希望这些内容构成专栏或对频率有所贡献。

Ultimately, the data above would look like: 最终,以上数据将如下所示:

dat

POP   Site1_28 Site1_32 Site1_36 Site1_42 Site1_48 Site2_8 Site2_10 Site2_16
POP1  0        0.333    0.167    0.333    0.166    0.167   0.5      0.333   
POP2  0        0        0.5      0.167    0.333    0       0.75     0.25    
POP3  0.167    0.667    0        0        0.167    0.5     0.333    0.167

I'm guessing I'll be looking at lapply() over some use of table() and aggregate(), but I really have no idea where to start. 我猜我会在使用table()和aggregate()的同时查看lapply(),但是我真的不知道从哪里开始。

Thank you! 谢谢!

I think this should do what you want. 我认为这应该做您想要的。 First, we do some data manipulation to make our call to table work. 首先,我们进行一些数据操作以使对table的调用有效。 Then, we iterate over the two columns, performing a prop.table for the sites by each POP value. 然后,我们遍历两列,通过每个POP值为站点执行一个prop.table Finally, we use rbind and cbind to combine the data. 最后,我们使用rbindcbind合并数据。

#create data.frame
dat<-data.frame(POP, IID, Site1, Site2,
                stringsAsFactors = FALSE)
#identify columns containing 'Site'
site_col_names <- names(dat)[grep(pattern = 'Site', x = names(dat))]
#for each site column, recode -9 as NA, and then paste
for(i in site_col_names){
  dat[i] <- factor(sapply(dat[i], function(x) 
    ifelse(x == -9, NA, paste0(i,'_',x))))
}
#iterate over columns, calculate prop.table
do.call('cbind',
        lapply(site_col_names, function(n){
          do.call('rbind',
                  by(dat, dat$POP, function(d) prop.table(table(d[n]))))
        }))

      Site1_28  Site1_32  Site1_36  Site1_42  Site1_48  Site2_10  Site2_16   Site2_8
POP1 0.0000000 0.3333333 0.1666667 0.3333333 0.1666667 0.5000000 0.3333333 0.1666667
POP2 0.0000000 0.0000000 0.5000000 0.1666667 0.3333333 0.7500000 0.2500000 0.0000000
POP3 0.1666667 0.6666667 0.0000000 0.0000000 0.1666667 0.3333333 0.1666667 0.5000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM