[英](R) How to copy paste values from one column based on another column and ID in R
For simplicity reasons, let's assume I have two columns.为简单起见,假设我有两列。 First: ID (string of codes such as AA23, BA53, NA , etc.) Second: Age (18, 32, 55, 23, etc.)
第一个:ID(一串代码,如 AA23、BA53、 NA等) 第二个:年龄(18、32、55、23 等)
And IDs sometimes repeat (ie, one person - AA23 filled the survey in two days, but only on the first day was asked how old he is, but during the second and third day not).并且 ID 有时会重复(即,一个人 - AA23 在两天内填写了调查表,但仅在第一天被问到他的年龄,但在第二天和第三天没有)。
I want to copy paste values from the Age column based on the ID, so that I have a 'long format' of the dataframe.我想根据 ID 从 Age 列中复制粘贴值,这样我就有了 dataframe 的“长格式”。
dput(data):输入(数据):
structure(list(Code = c("MW68", "AW80", "EW40", "BW60", "Wn36",
"ZK45", "SI55", "MW68", "EW40", "DC06", NA, "IW28"), Age = c("52",
"26", "34", "26", "20", "35", NA, NA, NA, NA, NA, NA)), row.names = c(5L,
6L, 7L, 8L, 9L, 10L, 400L, 401L, 402L, 403L, 404L, 405L), class = "data.frame")
Input:
ID Age
AA23 18
BA53 32
AC13 55
AA23 NA
BA53 NA
AC13 NA
NA 23
AA23 NA
(the trick is that sometimes ID is NA)
And the desired output:
ID Age
AA23 18
BA53 32
AC13 55
AA23 18
BA53 32
AC13 55
NA 23
AA23 18
Thank you in advance!先感谢您!
I'm not quite sure if I understood correctly what you want to do, but this code here should look where Age is NA
and fill in the mean of the Age from the other rows with the same entry in Code .我不太确定我是否正确理解了您想要做什么,但是这里的代码应该查看Age是
NA
的位置,并使用Code中的相同条目从其他行填写Age的平均值。 Obviously, this will fail if there are values for Code where no Age value exists anywhere in the table.显然,如果代码的值在表中的任何地方都不存在Age值,这将失败。 If there are various values for Age in different rows with the same Code , it will fill in the mean in this example, since you didn't specify what to do in such a case.
如果在具有相同Code的不同行中有不同的Age值,它将在此示例中填充平均值,因为您没有指定在这种情况下要做什么。
for(i in 1:nrow(data)){
if(!is.na(data$Code[i])){
if(is.na(data$Age[i])){
data$Age[i] <- mean(data$Age[data$Code == data$Code[i]], na.rm = TRUE)
}
}
}
This skips rows with NA
in the Code column.这会跳过代码列中带有
NA
的行。
You can also use the function coalesce
which finds the first NA
value and replace it with the value you define, here we would like it to be the first value of every Age
variable (grouping variable):您还可以使用 function
coalesce
找到第一个NA
值并将其替换为您定义的值,这里我们希望它是每个Age
变量(分组变量)的第一个值:
library(dplyr)
df %>%
group_by(Code) %>%
mutate(across(Age, ~ coalesce(.x, first(.x))))
# A tibble: 12 x 2
# Groups: Code [10]
Code Age
<chr> <chr>
1 MW68 52
2 AW80 26
3 EW40 34
4 BW60 26
5 Wn36 20
6 ZK45 35
7 SI55 NA
8 MW68 52
9 EW40 34
10 DC06 NA
11 NA NA
12 IW28 NA
Here's a solution based on zoo
's function na.locf
("in the case of NA, last observation carried forward"): first you group by Code
then you mutate column
Age using
ifelse and carrying the last non-
NA` observation forward:这是一个基于
zoo
的 function na.locf
的解决方案(“在 NA 的情况下,最后一次观察结转”):首先你按Code
分组,然后你using
ifelse mutate column
Age 并向前and carrying the last non-
NA` 观察:
library(zoo)
data %>%
group_by(Code) %>%
mutate(Age = ifelse(is.na(Age), na.locf(Age), Age))
# A tibble: 12 x 2
# Groups: Code [10]
Code Age
<chr> <chr>
1 MW68 52
2 AW80 26
3 EW40 34
4 BW60 26
5 Wn36 20
6 ZK45 35
7 SI55 NA
8 MW68 52 # <- value `carried forward`
9 EW40 34 # <- value `carried forward`
10 DC06 NA
11 NA NA
12 IW28 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.