简体   繁体   English

R:反算补缺值

[英]R: Fill in missing values by back calculation

I have received a data set in which all values below 10 have been replaced by a *.我收到了一个数据集,其中所有低于 10 的值都已替换为 *。 However, the data set also contains row and column totals, which partially make the back calculation possible.但是,数据集还包含行和列总计,这部分地使反向计算成为可能。

For the rows I have already managed this, but for the columns I lack the inspiration how this could work.对于我已经管理过的行,但对于列,我缺乏如何工作的灵感。

When reading the CSV files, the * are converted to NAs, so a sample dataset looks like this:读取 CSV 文件时,* 被转换为 NA,因此示例数据集如下所示:

ID ID V1 V1 V2 V2 V3 V3 VS VS
A1 A1 11 11 12 12 13 13 36 36
A2 A2 NA不适用 11 11 12 12 32 32
A3 A3 NA不适用 12 12 NA不适用 24 24
AS作为 27 27 35 35 32 32 92 92

In this example the NA for ID A2 should be replaced by 9 [ 32 - (11 + 12) ].在此示例中,ID A2 的 NA 应替换为 9 [ 32 - (11 + 12) ]。 The next step is to calculate the NAs for ID A3.下一步是计算 ID A3 的 NA。 V1 should be replaced by 7 [ 27 - (11 + 9) ] and V3 by 5 [ 32 - (13 + 12) ] V1 应替换为 7 [ 27 - (11 + 9) ] 和 V3 替换为 5 [ 32 - (13 + 12) ]

I feel like this is actually one of the simplest problems, but I just can't come up with the solution.我觉得这实际上是最简单的问题之一,但我就是想不出解决方案。 Can anyone help me out with this?谁能帮我解决这个问题?

Thanks a lot Benne非常感谢本尼

A solution with dplyr : I do it in two steps, 1) calculate the imputation that is performed rowwise, and 2) the column-specific imputation.使用dplyr的解决方案:我分两步进行,1)计算按行执行的插补,2)特定于列的插补。

df <- read.table(text= "ID  V1  V2  V3  VS
A1  11  12  13  46
A2  NA  11  12  32
A3  NA  12  NA  24
AS  27  35  32  102", h = T) 


library(dplyr)
df %>%
  slice(1:2) %>% 
  mutate(across(V1:VS, ~ifelse(is.na(.x), VS - V3 - V2, .x))) %>% 
  rbind(df[3:4,1:5]) %>% 
  mutate(across(V1:VS, ~ifelse(is.na(.x), dplyr::lead(.x) - dplyr::lag(.x, 2) - dplyr::lag(.x,1), .x)))

Result :结果

  ID V1 V2 V3  VS
1 A1 11 12 13  46
2 A2  9 11 12  32
3 A3  7 12  7  24
4 AS 27 35 32 102

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM