(R) 如何根据 R 中的另一列和 ID 从一列复制粘贴值

Question

For simplicity reasons, let's assume I have two columns.为简单起见，假设我有两列。 First: ID (string of codes such as AA23, BA53, NA , etc.) Second: Age (18, 32, 55, 23, etc.)第一个：ID（一串代码，如 AA23、BA53、 NA等）第二个：年龄（18、32、55、23 等）

And IDs sometimes repeat (ie, one person - AA23 filled the survey in two days, but only on the first day was asked how old he is, but during the second and third day not).并且 ID 有时会重复（即，一个人 - AA23 在两天内填写了调查表，但仅在第一天被问到他的年龄，但在第二天和第三天没有）。

I want to copy paste values from the Age column based on the ID, so that I have a 'long format' of the dataframe.我想根据 ID 从 Age 列中复制粘贴值，这样我就有了 dataframe 的“长格式”。

dput(data):输入（数据）：

structure(list(Code = c("MW68", "AW80", "EW40", "BW60", "Wn36", 
"ZK45", "SI55", "MW68", "EW40", "DC06", NA, "IW28"), Age = c("52", 
"26", "34", "26", "20", "35", NA, NA, NA, NA, NA, NA)), row.names = c(5L, 
6L, 7L, 8L, 9L, 10L, 400L, 401L, 402L, 403L, 404L, 405L), class = "data.frame")

Input:

ID   Age
AA23 18
BA53 32
AC13 55
AA23 NA
BA53 NA  
AC13 NA
NA   23
AA23 NA
(the trick is that sometimes ID is NA)

And the desired output:
ID   Age
AA23 18
BA53 32
AC13 55
AA23 18
BA53 32  
AC13 55
NA   23
AA23 18

Thank you in advance!先感谢您！

Answer 1

I'm not quite sure if I understood correctly what you want to do, but this code here should look where Age is NA and fill in the mean of the Age from the other rows with the same entry in Code .我不太确定我是否正确理解了您想要做什么，但是这里的代码应该查看Age是NA的位置，并使用Code中的相同条目从其他行填写Age的平均值。 Obviously, this will fail if there are values for Code where no Age value exists anywhere in the table.显然，如果代码的值在表中的任何地方都不存在Age值，这将失败。 If there are various values for Age in different rows with the same Code , it will fill in the mean in this example, since you didn't specify what to do in such a case.如果在具有相同Code的不同行中有不同的Age值，它将在此示例中填充平均值，因为您没有指定在这种情况下要做什么。

for(i in 1:nrow(data)){
  if(!is.na(data$Code[i])){
    if(is.na(data$Age[i])){
      data$Age[i] <- mean(data$Age[data$Code == data$Code[i]], na.rm = TRUE)
    }
  }
}

This skips rows with NA in the Code column.这会跳过代码列中带有NA的行。

Answer 2

You can also use the function coalesce which finds the first NA value and replace it with the value you define, here we would like it to be the first value of every Age variable (grouping variable):您还可以使用 function coalesce找到第一个NA值并将其替换为您定义的值，这里我们希望它是每个Age变量（分组变量）的第一个值：

library(dplyr)

df %>%
  group_by(Code) %>%
  mutate(across(Age, ~ coalesce(.x, first(.x))))

# A tibble: 12 x 2
# Groups:   Code [10]
   Code  Age  
   <chr> <chr>
 1 MW68  52   
 2 AW80  26   
 3 EW40  34   
 4 BW60  26   
 5 Wn36  20   
 6 ZK45  35   
 7 SI55  NA   
 8 MW68  52   
 9 EW40  34   
10 DC06  NA   
11 NA    NA   
12 IW28  NA

Answer 3

Here's a solution based on zoo 's function na.locf ("in the case of NA, last observation carried forward"): first you group by Code then you mutate column Age using ifelse and carrying the last non- NA` observation forward:这是一个基于zoo的 function na.locf的解决方案（“在 NA 的情况下，最后一次观察结转”）：首先你按Code分组，然后你using ifelse mutate column Age 并向前and carrying the last non- NA` 观察：

library(zoo)
data %>%
  group_by(Code) %>%
  mutate(Age = ifelse(is.na(Age), na.locf(Age), Age))
# A tibble: 12 x 2
# Groups:   Code [10]
   Code  Age  
   <chr> <chr>
 1 MW68  52   
 2 AW80  26   
 3 EW40  34   
 4 BW60  26   
 5 Wn36  20   
 6 ZK45  35   
 7 SI55  NA   
 8 MW68  52   # <- value `carried forward`
 9 EW40  34   # <- value `carried forward`
10 DC06  NA   
11 NA    NA   
12 IW28  NA

(R) 如何根据 R 中的另一列和 ID 从一列复制粘贴值

问题描述

3 个解决方案

解决方案1
1 2021-05-23 10:53:25

解决方案2
1 已采纳 2021-05-23 11:43:05

解决方案3
0 2021-05-23 11:10:04

(R) 如何根据 R 中的另一列和 ID 从一列复制粘贴值

问题描述

3 个解决方案

解决方案1 1 2021-05-23 10:53:25

解决方案2 1 已采纳 2021-05-23 11:43:05

解决方案3 0 2021-05-23 11:10:04

解决方案1
1 2021-05-23 10:53:25

解决方案2
1 已采纳 2021-05-23 11:43:05

解决方案3
0 2021-05-23 11:10:04