[英]R - Add a column on a dataset based on values from another dataset
我有一個數據集 (df1),其中有一列包含每個所有者的 Remaining_points
DF1:
Id Owner Remaining_points
00001 John 18
00008 Paul 34
00011 Alba 52
00004 Martha 67
另一個具有不同 id 的包含點 Df2
Id Points
00025 17
00076 35
00089 51
00092 68
我需要向 df2 添加一個 Owner 列,該列與 df1 上的 Remaining_points 最相似
所需 dataframe:
Id Points Owner
00025 17 John
00076 35 Paul
00089 51 Alba
00092 68 Martha
拜托,有人可以幫我嗎? 我曾經與 dplyr 一起工作,但任何解決方案都將不勝感激。
df1 <- data.frame(ID = c("00001", "00008", "00011", "00004"),
Owner = c("John", "Paul", "Alba", "Martha"),
Remaining_points = c(18, 34, 52, 67))
df2 <- data.frame(ID = c("00025", "00076", "00089", "00092"),
Points = c(17, 35, 51, 68))
ind <- which(apply(abs(outer(df1$Remaining_points,df2$Points, "-")), 2, function(x) x == min(x)), arr.ind = TRUE)
df2$Owner <- df1$Owner[ind[,1]]
df2
ID Points Owner
1 00025 17 John
2 00076 35 Paul
3 00089 51 Alba
4 00092 68 Martha
@tacoman 的效果很好。 但我無法抗拒包括 dplyr 版本。 交叉連接正在做與@tacoman 的outer()
類似的工作。
df1 <- data.frame(ID_1 = c("00001", "00008", "00011", "00004"),
Owner = c("John", "Paul", "Alba", "Martha"),
Remaining_points = c(18, 34, 52, 67))
df2 <- data.frame(ID_2 = c("00025", "00076", "00089", "00092"),
Points = c(17, 35, 51, 68))
df1 |>
dplyr::full_join(df2, by = character()) |> # This is essentially a cross join b/c no key is used.
dplyr::mutate(
distance = abs(Points - Remaining_points), # Find the difference in all possibilities
) |>
dplyr::group_by(ID_2) |> # Isolate each ID in its own sub-dataset
dplyr::mutate(
rank = dplyr::row_number(distance), # Rank the distances. The closest will be '1'.
) |>
dplyr::filter(rank == 1L) |> # Keep only the closest
dplyr::ungroup() |>
dplyr::select(
ID_2,
Points,
Owner
)
結果:
# A tibble: 4 x 3
ID_2 Points Owner
<chr> <dbl> <chr>
1 00025 17 John
2 00076 35 Paul
3 00089 51 Alba
4 00092 68 Martha
這是中間結果(在刪除額外的行和列之前):
# A tibble: 16 x 7
# Groups: ID_2 [4]
ID_1 Owner Remaining_points ID_2 Points distance rank
<chr> <chr> <dbl> <chr> <dbl> <dbl> <int>
1 00001 John 18 00025 17 1 1 # <- closest for John
2 00001 John 18 00076 35 17 2
3 00001 John 18 00089 51 33 4
4 00001 John 18 00092 68 50 4
5 00008 Paul 34 00025 17 17 2
6 00008 Paul 34 00076 35 1 1 # <- closest for Paul
7 00008 Paul 34 00089 51 17 3
8 00008 Paul 34 00092 68 34 3
9 00011 Alba 52 00025 17 35 3
10 00011 Alba 52 00076 35 17 3
11 00011 Alba 52 00089 51 1 1 # <- closest for Alba
12 00011 Alba 52 00092 68 16 2
13 00004 Martha 67 00025 17 50 4
14 00004 Martha 67 00076 35 32 4
15 00004 Martha 67 00089 51 16 2
16 00004 Martha 67 00092 68 1 1 # <- closest for Martha
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.