R 遍歷數據框並根據條件將增量值添加到列

Question

我有一個這樣的數據框：

tdf <- structure(list(indx = c(1, 1, 1, 2, 2, 3, 3), group = c(1, 1, 
2, 1, 2, 1, 1)), .Names = c("indx", "group"), row.names = c(NA, 
-7L), class = "data.frame")

數據框如下所示：

   indx group
1    1     1
2    1     1
3    1     2
4    2     1
5    2     2
6    3     1
7    3     1

我想遍歷組，並將第一個索引的組值保留為所需的輸出

對於第一個索引值之后的每個增量，我想從前一個索引添加組的最大值，並希望從第二個城市開始增加組值。

所需的 output 是這樣的：

    indx group    desiredOutput
1    1     1             1
2    1     1             1
3    1     2             2
4    2     1             3
5    2     2             4
6    3     1             5
7    3     1             5

為了清楚起見，我將按如下方式拆分數據框：

    indx group    desiredOutput
1    1     1             1
2    1     1             1       To be retained as is
3    1     2             2


4    2     1             3       Second index-the max value of desiredOutput in indx1 is 2                   
5    2     2             4       I want to add this max value to the group value in indx 2       


6    3     1             5       Similarly, the max value of des.out of indx2 is 4
7    3     1             5       Adding the max value to group provides me new values

我嘗試將此數據框拆分為數據框列表並迭代到每個數據框。

ndf <- split(tdf,f = tdf$indx)
x <- 0
for (i in seq_along(ndf)){
    ndf[[i]]$ng <- ndf[[i]]$group+x
    x <- max(ndf[[i]]$indx) + 1
}
ndf

上面的代碼更新了第二個索引的值，但是當它到達第三個索引時失敗了。

Answer 1

首先，找到每個索引的最大組值，然后計算這些組的累積總和。

library(dplyr)

maxGroupVals <- tdf %>% 
  group_by(indx) %>% 
  summarise(maxVal = max(group)) %>% 
  mutate(indx = indx + 1, maxVal = cumsum(maxVal))

將 1 添加到索引，因為這是將添加這些最大值的索引。 加入數據框將為您提供一個目標增加的列。 然后它是一個簡單的變異，帶有一個條件語句來處理 index = 1 的情況。

tdf %>% 
  left_join(maxGroupVals) %>% 
  mutate(desiredOutput = if_else(indx == 1, group, group + maxVal)) %>% 
  select(-maxVal)

如果需要，刪除中間計算列。

Answer 2

dplyr版本 1.0.1 具有 function cur_group_id()完全符合您的要求。 在dplyr, the group_indices 的 function 是您想要的：

library(dplyr)
tdf %>% group_by(indx, group) %>%
  mutate(desiredOutput = cur_group_id()) %>%
  ungroup()

Answer 3

考慮合並這兩列，然后轉換為因子，然后轉換為 integer。 因子級別由unique設置，以避免按字母或數字排序，但在原始數據框中保留順序。

tdf <- within(tdf, {
    tmp <- paste(indx, group, sep="&")    
    new_indx <- as.integer(factor(tmp, levels=unique(tmp)))
    rm(tmp)    
})

tdf
#   indx group new_indx
# 1    1     1        1
# 2    1     1        1
# 3    1     2        2
# 4    2     1        3
# 5    2     2        4
# 6    3     1        5
# 7    3     1        5

Answer 4

要獲得唯一索引/組組合的運行計數，您可以簡單地執行（在預先排序的數據上）：

tdf$desiredOutput <- cumsum(!duplicated(tdf))

這使：

  indx group desiredOutput
1    1     1             1
2    1     1             1
3    1     2             2
4    2     1             3
5    2     2             4
6    3     1             5
7    3     1             5

R 遍歷數據框並根據條件將增量值添加到列

問題描述

4 個解決方案

解決方案1
1 已采納 2020-08-14 15:24:29

解決方案2
1 2020-08-14 15:24:55

解決方案3
1 2020-08-14 15:37:09

解決方案4
1 2020-08-14 15:41:12

R 遍歷數據框並根據條件將增量值添加到列

問題描述

4 個解決方案

解決方案1 1 已采納 2020-08-14 15:24:29

解決方案2 1 2020-08-14 15:24:55

解決方案3 1 2020-08-14 15:37:09

解決方案4 1 2020-08-14 15:41:12

解決方案1
1 已采納 2020-08-14 15:24:29

解決方案2
1 2020-08-14 15:24:55

解決方案3
1 2020-08-14 15:37:09

解決方案4
1 2020-08-14 15:41:12