使用 tidyverse 同時過濾、匯總並將結果放入指定 position 的同一數據幀的新列中

Question

在過濾了我想要匯總的子集的所有數據后，我正在嘗試將匯總統計的結果添加到同一 tibble 的另一列中指定 position 的 tibble 中。 它包含在網格單元景觀中的空間顯式模擬結果。 我有一列指定景觀的列和行以及結果的列。 我想要做的是獲取目標網格單元格，例如row= 2, col =2並計算目標單元格及其八個周圍單元格的方差。 此結果應存儲在目標網格單元行中的 dataframe 的新列中。 將數據過濾到 select 只有我感興趣的九個補丁工作正常，但將結果存儲在特定位置的新列中不起作用。 我需要一個通用的解決方案，因為我想遍歷所有網格單元（由 row 和 col 給出）並遍歷幾個包含相似數據的小標題，但對於不同的景觀，我在這里放置示例代碼，我的真實數據集很多更大。

data= tibble(row=c(1,1,1, 2,2,2, 3,3,3), col=c(1,2,3, 1,2,3, 1,2,3), x=c(0.5, 0.5, 0.5, 0.4, 0.4, 0.4, 0.3, 0.3, 0.3), cluster_var=0)
> data
# A tibble: 9 x 4
    row   col     x cluster_var
  <dbl> <dbl> <dbl>       <dbl>
1     1     1   0.5           0
2     1     2   0.5           0
3     1     3   0.5           0
4     2     1   0.4           0
5     2     2   0.4           0
6     2     3   0.4           0
7     3     1   0.3           0
8     3     2   0.3           0
9     3     3   0.3           0

比方說，這是包含我的結果的 tibble。 現在我想 select 目標網格單元格及其八個相鄰單元格，例如 row=2，col=2，並計算這九個單元格的 x 方差，所以我這樣做了：

i_row=2
i_col=2

  data%>%filter(row==(i_row-1) | row == (i_row+1) | row==i_row) %>% 
  filter(col==(i_col-1) | col==(i_col+1) | col==i_col) %>% 
  summarise(var(x))
# A tibble: 1 x 1
  `var(x)`
     <dbl>
1   0.0075

現在我想將它存儲在行中的data$cluster_var中，其中row=2和col=2 ，因此生成的 tibble 將是：

> data
# A tibble: 9 x 4
    row   col     x cluster_var
  <dbl> <dbl> <dbl>       <dbl>
1     1     1   0.5           0
2     1     2   0.5           0
3     1     3   0.5           0
4     2     1   0.4           0
5     2     2   0.4           0.0075
6     2     3   0.4           0
7     3     1   0.3           0
8     3     2   0.3           0
9     3     3   0.3           0

當然，我需要遍歷row和col的所有可能值，以填充cluster_var的整個列，而實際數據集非常大，所以我不能手動完成。 我嘗試使用mutate ，但它沒有按我的意願工作。

data%>%
  mutate(., cluster_var[row==i_row, col==i_col] = 
  filter(row==(i_row-1) | row == (i_row+1) | row==i_row) %>% 
  filter(col==(i_col-1) | col==(i_col+1) | col==i_col) %>% 
  summarise(var(x)))
Error: unexpected '=' in "data%>%
  mutate(., cluster_var[row==i_row, col==i_col] ="

目前，我很茫然，非常感謝您的幫助：編輯：有關我的數據的更多信息。 我感興趣的九個條目不是連續的。 當目標網格有row= 2和col= 2時，我感興趣的值是： [2,1], [2,3], [1,1], [2,1], [3,1], [1,3], [2,3], [3,3] 。 在示例數據中，它們位於連續行中，但在我的實際數據中，我有 64 行和 64 列，前 64 行中row的值為 1 ，而col從1:64增加，然后row=2和col 又是1:64等等，總共 4096 行。 所以我要總結的結果的行號沒有鏈接到row或col中的值

Answer 1

根據我的理解，您想要計算包括目標單元格值在內的九個值的方差。 該解決方案可以使用索引值 dataframe 和唯一鍵來獲取目標單元格。 下面是使用 for 循環和 dplyr 的解決方案：

df= tibble(row=c(1,1,1, 2,2,2, 3,3,3), col=c(1,2,3, 1,2,3, 1,2,3), x=c(0.5, 0.5, 0.5, 0.4, 0.4, 0.4, 0.3, 0.3, 0.3), cluster_var=0)
l<-c() # empty vector which will be used for stroing variance value
df$RowNumber<- row.names(df) # getting index of row
df$key<-paste0(df$row,",",df$col) # generating key

keyList<- unique(df$key) #list all unique values of key , over this loop will run

for(i in 1:length(keyList)){

  #cat("Running For:",i,'\n')
  rowIndx <- df %>% 
        filter(key==keyList[i]) %>% 
        select(RowNumber) %>%
        as.numeric()

  filterValues <-seq((rowIndx-4):(rowIndx+4)) # getting index for 9 values including target cell

 l[i]<- df %>% 
    filter(RowNumber %in% filterValues) %>% 
     summarise(.,cluster_var =  var(x))

}

df$cluster_var<- unlist(l) # adding calculated variance to data frame

此解決方案可能不是最佳解決方案。

Answer 2

我在這里找到了一個適用於大多數目的的解決方案。 它不僅在 tidyverse 中，而且在工作中。 以下代碼可以滿足我的要求：

data=tibble(row=c(1,1,1, 2,2,2, 3,3,3), col=c(1,2,3, 1,2,3, 1,2,3), x=c(0.5, 0.5, 0.5, 0.4, 0.4, 0.4, 0.3, 0.3, 0.3))
cluster_var=numeric(nrow(data))

for(i in 1:max(data$row)){
  for(j in 1:max(data$col)){
    i_row=i
    i_col=j
    position=which(data$row==i_row & data$col==i_col)
    cluster_var_temp= as.numeric(data%>%
                                   filter(row==(i_row-1) | row == (i_row+1) | row==i_row) %>% 
                                   filter(col==(i_col-1) | col==(i_col+1) | col==i_col) %>% 
                                   summarise(var(x)))
    cluster_var[position]=cluster_var_temp

  }

} 

data=cbind(data, cluster_var)

> data
  row col   x cluster_var
1   1   1 0.5 0.003333333
2   1   2 0.5 0.003000000
3   1   3 0.5 0.003333333
4   2   1 0.4 0.008000000
5   2   2 0.4 0.007500000
6   2   3 0.4 0.008000000
7   3   1 0.3 0.003333333
8   3   2 0.3 0.003000000
9   3   3 0.3 0.003333333

感謝大家的幫助，@Nirbhay Singh。 你讓我朝着正確的方向前進。 也許這有助於將來搜索此內容或類似內容的人。

使用 tidyverse 同時過濾、匯總並將結果放入指定 position 的同一數據幀的新列中

問題描述

2 個解決方案

解決方案1
1 2020-05-04 11:30:20

解決方案2
0 已采納 2020-05-08 12:00:56

使用 tidyverse 同時過濾、匯總並將結果放入指定 position 的同一數據幀的新列中

問題描述

2 個解決方案

解決方案1 1 2020-05-04 11:30:20

解決方案2 0 已采納 2020-05-08 12:00:56

解決方案1
1 2020-05-04 11:30:20

解決方案2
0 已采納 2020-05-08 12:00:56