简体   繁体   中英

How to input average values calculated from specific non NA value columns into existing average column

![Text] Dataframe being used

Please see the link for the dataframe image. I am trying to calculate the Average temp for the rows which are missing TempAvg values. Some rows have TempMin and TempMax and therefore the Avg temperature can be calculated using this. That being said I need to use the function to calculate TempAvg for rows where TempAvg does not already exist and would be an NA value. I run the risk of having new TempAvg values calculated for already existing values in the TempAvg column. I have tried to forumalte a for loop but after continued reading see this is not the best option. How would one go about this issue as the dataframe contains 13 million rows. Just to clarify I would want to keep as many rows as possible as checking rows where TempAvg is not NA shows only 2.5 million rows meaning a huge amount of data is lost if I drop TempMin & Max columns with NA values

sum(!is.na(Avg))
[1] 2535882

It sounds like you can solve this with a case_when statement. Something like:

library(tidyverse)

df %>%
  mutate(TempAvg = case_when(
    !is.na(TempAvg) ~ TempAvg,
    TRUE ~ (TempMax + TempMin) / 2
  ))

With that code, you're saying, "Create a column called TempAvg, and fill it with the existing TempAvg values where they're not NA, otherwise fill it with the average between TempMax and TempMin"

Following some trial and error the solution needed a simple fix

Weather2021 <- Weather2021 %>%
    mutate(TempAvg = case_when(
    !is.na(TempAvg) ~ as.numeric(TempAvg),
    !is.na(TempMax) & !is.na(TempMin) ~ (TempMax + TempMin) / 2))

the as.numeric conversion seemingly key

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM