[英]R - Add a column on a dataset based on values from another dataset
我有一个数据集 (df1),其中有一列包含每个所有者的 Remaining_points
DF1:
Id Owner Remaining_points
00001 John 18
00008 Paul 34
00011 Alba 52
00004 Martha 67
另一个具有不同 id 的包含点 Df2
Id Points
00025 17
00076 35
00089 51
00092 68
我需要向 df2 添加一个 Owner 列,该列与 df1 上的 Remaining_points 最相似
所需 dataframe:
Id Points Owner
00025 17 John
00076 35 Paul
00089 51 Alba
00092 68 Martha
拜托,有人可以帮我吗? 我曾经与 dplyr 一起工作,但任何解决方案都将不胜感激。
df1 <- data.frame(ID = c("00001", "00008", "00011", "00004"),
Owner = c("John", "Paul", "Alba", "Martha"),
Remaining_points = c(18, 34, 52, 67))
df2 <- data.frame(ID = c("00025", "00076", "00089", "00092"),
Points = c(17, 35, 51, 68))
ind <- which(apply(abs(outer(df1$Remaining_points,df2$Points, "-")), 2, function(x) x == min(x)), arr.ind = TRUE)
df2$Owner <- df1$Owner[ind[,1]]
df2
ID Points Owner
1 00025 17 John
2 00076 35 Paul
3 00089 51 Alba
4 00092 68 Martha
@tacoman 的效果很好。 但我无法抗拒包括 dplyr 版本。 交叉连接正在做与@tacoman 的outer()
类似的工作。
df1 <- data.frame(ID_1 = c("00001", "00008", "00011", "00004"),
Owner = c("John", "Paul", "Alba", "Martha"),
Remaining_points = c(18, 34, 52, 67))
df2 <- data.frame(ID_2 = c("00025", "00076", "00089", "00092"),
Points = c(17, 35, 51, 68))
df1 |>
dplyr::full_join(df2, by = character()) |> # This is essentially a cross join b/c no key is used.
dplyr::mutate(
distance = abs(Points - Remaining_points), # Find the difference in all possibilities
) |>
dplyr::group_by(ID_2) |> # Isolate each ID in its own sub-dataset
dplyr::mutate(
rank = dplyr::row_number(distance), # Rank the distances. The closest will be '1'.
) |>
dplyr::filter(rank == 1L) |> # Keep only the closest
dplyr::ungroup() |>
dplyr::select(
ID_2,
Points,
Owner
)
结果:
# A tibble: 4 x 3
ID_2 Points Owner
<chr> <dbl> <chr>
1 00025 17 John
2 00076 35 Paul
3 00089 51 Alba
4 00092 68 Martha
这是中间结果(在删除额外的行和列之前):
# A tibble: 16 x 7
# Groups: ID_2 [4]
ID_1 Owner Remaining_points ID_2 Points distance rank
<chr> <chr> <dbl> <chr> <dbl> <dbl> <int>
1 00001 John 18 00025 17 1 1 # <- closest for John
2 00001 John 18 00076 35 17 2
3 00001 John 18 00089 51 33 4
4 00001 John 18 00092 68 50 4
5 00008 Paul 34 00025 17 17 2
6 00008 Paul 34 00076 35 1 1 # <- closest for Paul
7 00008 Paul 34 00089 51 17 3
8 00008 Paul 34 00092 68 34 3
9 00011 Alba 52 00025 17 35 3
10 00011 Alba 52 00076 35 17 3
11 00011 Alba 52 00089 51 1 1 # <- closest for Alba
12 00011 Alba 52 00092 68 16 2
13 00004 Martha 67 00025 17 50 4
14 00004 Martha 67 00076 35 32 4
15 00004 Martha 67 00089 51 16 2
16 00004 Martha 67 00092 68 1 1 # <- closest for Martha
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.