consider following dataset:
df<-data.frame(ID=c(1,2), Value_1=c(1,7), Value_2= c(NA,10), Value_3=c(NA,13), Value_4=c(7,NA))
What I would like to achieve is this:
df_target<-data.frame(ID=c(1,2), Value_1=c(1,7), Value_2= c(3,10), Value_3=c(5,13), Value_4=c(7,16))
As you can see here we have two diffrent issues:
"(last_know + previous_know)/number_of_elements"
and add this number to the last known value, proceed until you reach the last value: ie (1+7)/4=2 --> 1; 1+2; 1+2+2; 7 but how to combine this? Especially the first case is the most challenging part. I guess it should be done with median(last_known, previous_known), and then somehow count the missing values, and map it to the na_count_id and than add to the multiplication of mean and the corresponding na_count_id:
previous_known_value + na_count_id*median
Thanks in advance for your help!
Here is a solution that works. This should work even if there is an NA in the first column, based on testing I did. Basically, I iterate over every row by column. The increaser variable is the amount by which the column must be increased over the previous column to get the pattern you are looking to achieve.
library(tidyverse)
df <- column_to_rownames(df, var = "ID") # need to convert ID column to rownames
for(i in 1:nrow(df)){
increaser <- as.numeric((range(df[i,], na.rm = TRUE)[2] - range(df[i,], na.rm = TRUE)[1])/(which.max(df[i,]) - which.min(df[i,]))) # increaser is calculated by taking the range of the row and dividing by the difference between the indices of the max and min of the row
for(j in 1:ncol(df)){ # this iterates through every column
if(is.na(df[i,j])){
if(j == 1){ # special calculation needed for first column since there's no previous column to increase by
df[i, j] <- df[i, min(which(!is.na(df[i,])))] - increaser*(min(which(!is.na(df[i,])))-j) # this finds the next non NA column for that row, and subtracts that next non-NA column from the difference in the index positions multiplied by the increaser
} else {
df[i, j] <- df[i, j-1] + increaser # this is for an NA position which is not in the first column
}
} else {
df[i, j] <- df[i, j] # if a position is not NA, no calculations needed
}
}
}
# this loop returns the following. You can convert the row ID back to a column if desired.
# Value_1 Value_2 Value_3 Value_4
#1 1 3 5 7
#2 7 10 13 16
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.