简体   繁体   English

如何将从特定的非 NA 值列计算的平均值输入到现有的平均值列中

[英]How to input average values calculated from specific non NA value columns into existing average column

![Text] Dataframe being used ![文字] Dataframe 正在使用

Please see the link for the dataframe image.请参阅 dataframe 图像的链接。 I am trying to calculate the Average temp for the rows which are missing TempAvg values.我正在尝试计算缺少 TempAvg 值的行的平均温度。 Some rows have TempMin and TempMax and therefore the Avg temperature can be calculated using this.有些行有 TempMin 和 TempMax,因此可以使用它来计算平均温度。 That being said I need to use the function to calculate TempAvg for rows where TempAvg does not already exist and would be an NA value.话虽如此,我需要使用 function 来计算 TempAvg 的行,其中 TempAvg 尚不存在并且将是 NA 值。 I run the risk of having new TempAvg values calculated for already existing values in the TempAvg column.我冒着为 TempAvg 列中已经存在的值计算新的 TempAvg 值的风险。 I have tried to forumalte a for loop but after continued reading see this is not the best option.我曾尝试讨论 for 循环,但在继续阅读后发现这不是最好的选择。 How would one go about this issue as the dataframe contains 13 million rows.由于 dataframe 包含 1300 万行,go 将如何解决此问题。 Just to clarify I would want to keep as many rows as possible as checking rows where TempAvg is not NA shows only 2.5 million rows meaning a huge amount of data is lost if I drop TempMin & Max columns with NA values只是为了澄清我希望保留尽可能多的行,因为检查 TempAvg 不是 NA 的行仅显示 250 万行,这意味着如果我删除具有 NA 值的 TempMin 和 Max 列,则会丢失大量数据

sum(!is.na(Avg))
[1] 2535882

It sounds like you can solve this with a case_when statement.听起来你可以用case_when语句解决这个问题。 Something like:就像是:

library(tidyverse)

df %>%
  mutate(TempAvg = case_when(
    !is.na(TempAvg) ~ TempAvg,
    TRUE ~ (TempMax + TempMin) / 2
  ))

With that code, you're saying, "Create a column called TempAvg, and fill it with the existing TempAvg values where they're not NA, otherwise fill it with the average between TempMax and TempMin"使用该代码,您是在说,“创建一个名为 TempAvg 的列,并在其中填充它们不是 NA 的现有 TempAvg 值,否则用 TempMax 和 TempMin 之间的平均值填充它”

Following some trial and error the solution needed a simple fix经过一些试验和错误,解决方案需要一个简单的修复

Weather2021 <- Weather2021 %>%
    mutate(TempAvg = case_when(
    !is.na(TempAvg) ~ as.numeric(TempAvg),
    !is.na(TempMax) & !is.na(TempMin) ~ (TempMax + TempMin) / 2))

the as.numeric conversion seemingly key as.numeric 转换看似关键

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM