簡體   English   中英

R編程; 雙索引循環以查找子集數據幀的均值

[英]R programming; double indexing loop to find mean of subsetted data frame

我想運行一個使用三個索引的“ for”循環。 基本上,我想對數據框進行子集處理,找到子集的均值,然后將平均值放入新的數據幀中。 我在運行此循環時遇到麻煩; 我得到的只是NaN的。

第一個索引用於匹配新數據幀(我稱為data.avg)的行; 第二個索引用於索引要在子設置條件的前半部分使用的向量(日期值來自特定月份); 第二個索引與上面的相同,但是對於子設置條件的第二部分(該行與Breakfast / Dinner / Snacks相關聯)。

# Create the data frame
data1 = data.frame(date = sort(rep(as.Date(42948:43101, origin = "1899-12-30"),3)),
               serving = rep(c("Breakfast", "Dinner", "Snacks"), 154),
               units = rep(c(1,5,49), 154)
)
View(data1[order(data1$date),])

# take mean of each subset and place it in a new data frame called data.avgs
# it should consist of 8x3 data frame; rows (column1) are "August","September", "October", "November", "December", "January","February", "March".
# columns should be "Breakfast", "Dinner", "Snack"
month.index = c(8:12, 1)
serving.index = c("Breakfast", "Dinner", "Snack")

# create the data frame with the means using placeholder data
data.avg = data.frame(months = c(month.name[8:12], month.name[1]),
                  bf.avg = c(1:6),
                  dinner.avg = c(1:6),
                  snack.avg = c(1:6))

# now start replacing; find the mean of the subset of the original data frame.
# find the mean of all dates that are for August, and whose serving type are for Breakfast. 

    for(j in 1:6){
  for(i in month.index){
    for(v in 2:4){
      data.avg[j,v] = mean(
        subset(data1,
               months(data1$date) == month.name[i] & data1$serving == serving.index[v])$units
      )
    }
  }
}

例如,當我在沒有循環的情況下進行均值運算時,會出現這種情況;

mean(subset(data1, 
            months(data1$date) == "September" & data1$serving == "Breakfast")$unit)

我得到正確的意思。 因此,我認為我的問題可能出在索引設置上。

任何幫助都將不勝感激,

謝謝

編輯; 修復了上面的代碼。 結果數據幀如下;

months bf.avg dinner.avg snack.avg
1    August      5         49       NaN
2 September      5         49       NaN
3   October      5         49       NaN
4  November      5         49       NaN
5  December      5         49       NaN
6   January      5         49       NaN

這是我要找的東西;

mean(subset(data1, 
+             months(data1$date) == "September" & data1$serving == "Breakfast")$unit)
[1] 1
> mean(subset(data1, 
+             months(data1$date) == "September" & data1$serving == "Dinner")$unit)
[1] 5
> mean(subset(data1, 
+             months(data1$date) == "September" & data1$serving == "Snacks")$unit)
[1] 49

我的理解是這些應該是data1.avg [1,1:3]

您在serving.index設置了“小吃”,但在data1卻有“小吃”。

然后在for循環中嘗試以下代碼:

data.avg[j,v+1] = mean(
    subset(data1,months(data1$date) == month.name[i] & as.character(data1$serving) == serving.index[v])$units)

data.avg
     months bf.avg dinner.avg snack.avg
1    August      1          5        49
2 September      1          5        49
3   October      1          5        49
4  November      1          5        49
5  December      1          5        49
6   January      1          5        49

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM