[英]R for loop for calculating sums based on a data frame's different columns
我當前的數據框如下所示:
# Create sample data
my_df <- data.frame(seq(1, 100), rep(c("ind_1", "", "", ""), times = 25), rep(c("", "ind_2", "", ""), times = 25), rep(c("", "", "ind_3", ""), times = 25), rep(c("", "", "", "ind_4"), times = 25))
# Rename columns
names(my_df)[names(my_df)=="seq.1..100."] <- "value"
names(my_df)[names(my_df)=="rep.c..ind_1................times...25."] <- "ind_1"
names(my_df)[names(my_df)=="rep.c......ind_2............times...25."] <- "ind_2"
names(my_df)[names(my_df)=="rep.c..........ind_3........times...25."] <- "ind_3"
names(my_df)[names(my_df)=="rep.c..............ind_4....times...25."] <- "ind_4"
# Replace empty elements with NA
my_df[my_df==''] = NA
我要編寫的腳本是一個相當簡單的for
循環,它為四個ind_*
列中的每一個計算value
列的總和並輸出結果。
到目前為止,我非常微薄的嘗試是:
# Create a vector with all individuals
individuals <- c("ind_1", "ind_2", "ind_3", "ind_4")
# Calculate aggregates for each individual
for (i in individuals){
ind <- 1
sum_i <- aggregate(value~ind_1, data = my_df, sum)
print(paste("Individual", i, "possesses an aggregated value of", sum_i$value))
ind <- ind + 1
}
正如您所看到的,我當前很難包含正確的命令來計算一列接一列的總和作為當前輸出,自然地,僅計算ind_1
的結果。 為了達到預期的結果,需要在aggregate
命令中進行哪些更改(我是一個初學者,但是想到了使用索引從一列轉到另一列嗎?)?
假設您要計算ind列與您的個人向量中的表達式匹配的總和:
individuals <- c("ind_1", "ind_2", "ind_3", "ind_4")
for (i in 1:(ncol(my_df)-1)){
print(sum(my_df$value[which(my_df[,individuals[i]] == individuals[i])]))
}
為什么要使用print()
而不是將結果存儲在單獨的向量中?
您也可以嘗試tidyverse
:
my_df %>%
gather(key, Inds, -value) %>%
filter(!is.na(Inds)) %>%
group_by(key) %>%
summarise(Sum=sum(value))
# A tibble: 4 x 2
key Sum
<chr> <int>
1 ind_1 1225
2 ind_2 1250
3 ind_3 1275
4 ind_4 1300
想法是使用gather
來使數據長。 過濾出NA
,然后按Inds分組並匯總值。
更基本的R解決方案是:
library(reshape2)
my_df_long <- melt(my_df, id.vars = "value",value.name = "ID")
aggregate(value ~ ID, my_df_long, sum, na.rm= T)
ID value
1 ind_1 1225
2 ind_2 1250
3 ind_3 1275
4 ind_4 1300
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.