![](/img/trans.png)
[英]How to remove all observations for which there is no observation in the current year in R?
[英]R - Apply function - Replacing Observations with 0 Depending on an Observation's Variable to Determine which column to Start
我有以下數據框:
Farm <- c("ABC","DEF","XYZ")
YearlyVolume <- c(500, 1000, 200)
Forecast.2017.03.31 <- c(100, 200, 40)
Forecast.2017.06.30 <- c(150, 300, 40)
Forecast.2017.09.30 <- c(100, 100, 60)
Forecast.2017.12.31 <- c(150, 500, 100)
Disable <- c(NA,TRUE,TRUE)
Start <- c(NA,"2017.06.30",NA)
df <- data.frame(Farm, YearlyVolume, Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, Forecast.2017.12.31, Disable, Start)
Sequence <- c("2017.03.31","2017.06.30", "2017.09.30", "2017.12.31")
如果“禁用”變量為 TRUE,我想用 0 替換觀察的所有預測,除非“開始”變量指示開始刪除變量的日期。 所以我得到下表:
Farm <- c("ABC","DEF","XYZ")
YearlyVolume <- c(500, 1000, 200)
Forecast.2017.03.31 <- c(100, 200, 0)
Forecast.2017.06.30 <- c(150, 0, 0)
Forecast.2017.09.30 <- c(100, 0, 0)
Forecast.2017.12.31 <- c(150, 0, 0)
Disable <- c(NA,TRUE,TRUE)
Start <- c(NA,"2017.06.30",NA)
df2 <- data.frame(Farm, YearlyVolume, Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, Forecast.2017.12.31, Disable, Start)
我正在使用以下公式來替換所有指示為“TRUE”的預測。 但是,它沒有考慮開始用 0 替換預測的日期。
df[,grep(paste0("Forecast.",min(Sequence)),colnames(df)):grep(paste0("Forecast.",max(Sequence)),colnames(df))] <- apply(df[,grep(paste0("Forecast.",min(Sequence)),colnames(df)):grep(paste0("Forecast",max(Sequence)),colnames(df))], 2,
function(x) { replace(x,df$Disable == TRUE,0)})
為了考慮開始日期,我嘗試用 ifelse(!is.na(df$Start),df$Start,min(sequence)) 替換 min(sequence) 部分,如下所示:
df[,grep(paste0("Forecast.",ifelse(!is.na(df$Start),df$Start,min(sequence))),colnames(df)):grep(paste0("Forecast.",max(Sequence)),colnames(df))] <- apply(df[,grep(paste0("Forecast.",ifelse(!is.na(df$Start),df$Start,min(sequence))),colnames(df)):grep(paste0("Forecast",max(Sequence)),colnames(df))], 2,
function(x) { replace(x,df$Disable == TRUE,0)})
但是我收到以下錯誤:
“參數‘模式’的長度 > 1,並且只會使用第一個元素”
不知道我應該如何更改代碼以便在它存在時引用開始“日期”。
任何幫助表示贊賞。
這是一種方法。 我們創建一個用 0 替換值的函數,即
Fun1 <- function(df, var, n) {
ind1 <- grep('Forecast.', names(df))
replace(df[n,], var[n]:max(ind1), 0)
}
#create a new column which indicates when to start replacing with 0 based on Start variable
df$new <- sapply(df$Start, function(i) match(i, sub('^Forecast.', '', names(df))))
#Handle the NA in column "new"
df$new[is.na(df$new) & df$Disable == TRUE] <- min(ind1)
#Identify rows to change the values
ind2 <- which(!is.na(df$new))
#Apply the function
df[ind2,] <- as.data.frame(t(sapply(ind2, function(i) unlist(Fun1(df, df$new, i)))), stringsAsFactors = FALSE)
#use ind1 to convert to integers,
df[ind1] <- lapply(df[ind1], as.integer)
#Farm YearlyVolume Forecast.2017.03.31 Forecast.2017.06.30 Forecast.2017.09.30 Forecast.2017.12.31 Disable Start new
#1 ABC 500 100 150 100 150 <NA> <NA> <NA>
#2 DEF 1000 200 0 0 0 TRUE 2017.06.30 4
#3 XYZ 200 0 0 0 0 TRUE <NA> 3
筆記
我用stringsAsFactors = FALSE
讀取你的數據框,即
df <- data.frame(Farm, YearlyVolume,
Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30,
Forecast.2017.12.31, Disable, Start, stringsAsFactors = FALSE)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.