合並從r中的for循環輸出的數據幀

Question

我有一個像這樣的大數據框（僅顯示前三列）：

數據幀稱為chr22_hap12

我想獲取每一列的每個數字的比例（按順序依次為1、2和3），並將其存儲在數據框中。

這是我到目前為止的內容：

for (i in 1:3 ) {

  length(chr22_hap12[,i]) -> total_snps
  sum(chr22_hap12[,i]==1,na.rm=FALSE) -> counts_ancestry_1
  sum(chr22_hap12[,i]==2,na.rm=FALSE) -> counts_ancestry_2
  sum(chr22_hap12[,i]==3,na.rm=FALSE) -> counts_ancestry_3

  (counts_ancestry_1*100)/total_snps -> ancestry_1_perc
  (counts_ancestry_2*100)/total_snps -> ancestry_2_perc
  (counts_ancestry_3*100)/total_snps -> ancestry_3_perc

  haplo_df[i] = NULL

  haplo_df[i] = c(ancestry_1_perc,ancestry_2_perc,ancestry_3_perc)
  as.data.frame(haplo_df[i])
}

我得到這些錯誤：嘗試設置haplo_df [i] = NULL后

haplo_df [i] = NULL中的錯誤：找不到對象“ haplo_df”

之后

haplo_df [i] = c（祖先_1_perc，祖先_2_perc，祖先_3_perc）

haplo_df [i] = c（ancestry_1_perc，ancestry_2_perc，ancestry_3_perc）中的錯誤：找不到對象“ haplo_df”

並再次使用as.data.frame（haplo_df [i]）

找不到對象“ haplo_df”

我的願望輸出應如下所示：

0.00    66.66  50.0
100.00  33.33  33.33
0.00    0.00   16.66

Answer 1

您需要在循環之前定義結果matrix ，然后將新結果cbind到該matrix 。

# define the data.frame before the loop. 
haplo_df <- NULL
for (i in 1:3 ) {
  length(chr22_hap12[,i]) -> total_snps
  sum(chr22_hap12[,i]==1,na.rm=FALSE) -> counts_ancestry_1
  sum(chr22_hap12[,i]==2,na.rm=FALSE) -> counts_ancestry_2
  sum(chr22_hap12[,i]==3,na.rm=FALSE) -> counts_ancestry_3

  (counts_ancestry_1*100)/total_snps -> ancestry_1_perc
  (counts_ancestry_2*100)/total_snps -> ancestry_2_perc
  (counts_ancestry_3*100)/total_snps -> ancestry_3_perc

  # bind the new result to the existing data
  haplo_df <- cbind(haplo_df , c(ancestry_1_perc,ancestry_2_perc,ancestry_3_perc))
}
# return the result
haplo_df
##       [,1]     [,2]     [,3]
##  [1,]    0 66.66667 33.33333
##  [2,]  100 33.33333 16.66667
##  [3,]    0  0.00000 50.00000

相反，您也可以只使用apply和table ，例如

apply(chr22_hap12, 2, function(x) 100*table(factor(x, levels=1:3))/length(x))
##     V1       V2       V3
##  1   0 66.66667 33.33333
##  2 100 33.33333 16.66667
##  3   0  0.00000 50.00000

Answer 2

我的一班輪

sapply(df, function(x){prop.table(table(x))*100})

Answer 3

這是另一種方法。

樣本數據：

set.seed(23)
y <- 1:3
df <- data.frame(a = sample(y, 10, replace = TRUE), 
                 b = sample(y, 10, replace = TRUE), 
                 c = sample(y, 10, replace = TRUE))
#df
#   a b c
#1  2 3 2
#2  1 3 1
#3  1 2 1
#4  3 1 3
#5  3 3 2
#6  2 1 3
#7  3 2 3
#8  3 2 3
#9  3 3 1
#10 3 2 3

計算百分比：

newdf <- as.data.frame(t(do.call(rbind, lapply(df, function(z){
  sapply(y, function(x) (sum(z == x) / length(z))*100)
}))))

#newdf
#    a   b   c
#1 0.2 0.2 0.3
#2 0.2 0.4 0.2
#3 0.6 0.4 0.5

Answer 4

嘗試：

mydf
  V1 V2 V3
1  2  1  3
2  2  1  3
3  2  1  3
4  2  1  2
5  2  2  1
6  2  2  1


ll = list()
for(cc in 1:3) {
    dd = mydf[,cc]
    n1 = 100*length(dd[dd==1])/nrow(mydf)
    n2 = 100*length(dd[dd==2])/nrow(mydf)
    n3 = 100*length(dd[dd==3])/nrow(mydf)
    ll[[length(ll)+1]] = c(n1, n2, n3)
}
ll
[[1]]
[1]   0 100   0

[[2]]
[1] 66.66667 33.33333  0.00000

[[3]]
[1] 33.33333 16.66667 50.00000

> t(do.call(rbind, ll))
     [,1]     [,2]     [,3]
[1,]    0 66.66667 33.33333
[2,]  100 33.33333 16.66667
[3,]    0  0.00000 50.00000

合並從r中的for循環輸出的數據幀

問題描述

4 個解決方案

解決方案1
1 已采納 2014-10-07 15:51:09

解決方案2
1 2014-10-07 18:47:30

解決方案3
0 2014-10-07 16:04:01

解決方案4
0 2014-10-07 16:04:12

合並從r中的for循環輸出的數據幀

問題描述

4 個解決方案

解決方案1 1 已采納 2014-10-07 15:51:09

解決方案2 1 2014-10-07 18:47:30

解決方案3 0 2014-10-07 16:04:01

解決方案4 0 2014-10-07 16:04:12

解決方案1
1 已采納 2014-10-07 15:51:09

解決方案2
1 2014-10-07 18:47:30

解決方案3
0 2014-10-07 16:04:01

解決方案4
0 2014-10-07 16:04:12