簡體   English   中英

R - 應用函數 - 用 0 替換觀察值,根據觀察值的變量來確定要開始的列

[英]R - Apply function - Replacing Observations with 0 Depending on an Observation's Variable to Determine which column to Start

我有以下數據框:

Farm <- c("ABC","DEF","XYZ")
YearlyVolume <- c(500, 1000, 200)
Forecast.2017.03.31 <- c(100, 200, 40)
Forecast.2017.06.30 <- c(150, 300, 40)
Forecast.2017.09.30 <- c(100, 100, 60)
Forecast.2017.12.31 <- c(150, 500, 100)
Disable <- c(NA,TRUE,TRUE)
Start <- c(NA,"2017.06.30",NA)

df <- data.frame(Farm, YearlyVolume, Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, Forecast.2017.12.31, Disable, Start)

Sequence <- c("2017.03.31","2017.06.30", "2017.09.30", "2017.12.31")

如果“禁用”變量為 TRUE,我想用 0 替換觀察的所有預測,除非“開始”變量指示開始刪除變量的日期。 所以我得到下表:

Farm <- c("ABC","DEF","XYZ")
YearlyVolume <- c(500, 1000, 200)
Forecast.2017.03.31 <- c(100, 200, 0)
Forecast.2017.06.30 <- c(150, 0, 0)
Forecast.2017.09.30 <- c(100, 0, 0)
Forecast.2017.12.31 <- c(150, 0, 0)
Disable <- c(NA,TRUE,TRUE)
Start <- c(NA,"2017.06.30",NA) 

df2 <- data.frame(Farm, YearlyVolume, Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, Forecast.2017.12.31, Disable, Start)

我正在使用以下公式來替換所有指示為“TRUE”的預測。 但是,它沒有考慮開始用 0 替換預測的日期。

df[,grep(paste0("Forecast.",min(Sequence)),colnames(df)):grep(paste0("Forecast.",max(Sequence)),colnames(df))] <- apply(df[,grep(paste0("Forecast.",min(Sequence)),colnames(df)):grep(paste0("Forecast",max(Sequence)),colnames(df))], 2, 
    function(x) { replace(x,df$Disable == TRUE,0)})

為了考慮開始日期,我嘗試用 ifelse(!is.na(df$Start),df$Start,min(sequence)) 替換 min(sequence) 部分,如下所示:

df[,grep(paste0("Forecast.",ifelse(!is.na(df$Start),df$Start,min(sequence))),colnames(df)):grep(paste0("Forecast.",max(Sequence)),colnames(df))] <- apply(df[,grep(paste0("Forecast.",ifelse(!is.na(df$Start),df$Start,min(sequence))),colnames(df)):grep(paste0("Forecast",max(Sequence)),colnames(df))], 2, 
    function(x) { replace(x,df$Disable == TRUE,0)})

但是我收到以下錯誤:

“參數‘模式’的長度 > 1,並且只會使用第一個元素”

不知道我應該如何更改代碼以便在它存在時引用開始“日期”。

任何幫助表示贊賞。

這是一種方法。 我們創建一個用 0 替換值的函數,即

Fun1 <- function(df, var, n) {
  ind1 <- grep('Forecast.', names(df))
  replace(df[n,], var[n]:max(ind1), 0)
  }


#create a new column which indicates when to start replacing with 0 based on Start variable
df$new <- sapply(df$Start, function(i) match(i, sub('^Forecast.', '', names(df))))

#Handle the NA in column "new"
df$new[is.na(df$new) & df$Disable == TRUE] <- min(ind1)

#Identify rows to change the values
ind2 <- which(!is.na(df$new))

#Apply the function
df[ind2,] <- as.data.frame(t(sapply(ind2, function(i) unlist(Fun1(df, df$new, i)))), stringsAsFactors = FALSE)

#use ind1 to convert to integers,
df[ind1] <- lapply(df[ind1], as.integer)


#Farm YearlyVolume Forecast.2017.03.31 Forecast.2017.06.30 Forecast.2017.09.30 Forecast.2017.12.31 Disable      Start  new
#1  ABC          500                 100                 150                 100                 150    <NA>       <NA> <NA>
#2  DEF         1000                 200                   0                   0                   0    TRUE 2017.06.30    4
#3  XYZ          200                   0                   0                   0                   0    TRUE       <NA>    3    

筆記

我用stringsAsFactors = FALSE讀取你的數據框,即

df <- data.frame(Farm, YearlyVolume, 
                  Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, 
                  Forecast.2017.12.31, Disable, Start, stringsAsFactors = FALSE)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM