R - 根据另一个数据集的值在数据集上添加一列

Question

I have a dataset (df1) with a column that contains Remaining_points for each owner我有一个数据集 (df1)，其中有一列包含每个所有者的 Remaining_points

Df1: DF1：

Id      Owner   Remaining_points
00001   John    18
00008   Paul    34
00011   Alba    52
00004   Martha  67

And another one with different id's that contains points Df2另一个具有不同 id 的包含点 Df2

Id      Points
00025   17
00076   35
00089   51
00092   68

I need to add to df2 a Owner column with most similar Remaining_points on df1我需要向 df2 添加一个 Owner 列，该列与 df1 上的 Remaining_points 最相似

Desired dataframe:所需 dataframe：

Id      Points  Owner
00025   17      John
00076   35      Paul
00089   51      Alba
00092   68      Martha

Please, could anyone help me on this?拜托，有人可以帮我吗？ I'm used to work with dplyr but any solution would be very appreciated.我曾经与 dplyr 一起工作，但任何解决方案都将不胜感激。

Answer 1

df1 <- data.frame(ID = c("00001", "00008", "00011", "00004"),
                  Owner = c("John", "Paul", "Alba", "Martha"),
                  Remaining_points = c(18, 34, 52, 67))

df2 <- data.frame(ID = c("00025", "00076", "00089", "00092"),
                  Points = c(17, 35, 51, 68))

ind <- which(apply(abs(outer(df1$Remaining_points,df2$Points, "-")), 2, function(x) x == min(x)), arr.ind = TRUE)
df2$Owner <- df1$Owner[ind[,1]]
df2
     ID Points  Owner
1 00025     17   John
2 00076     35   Paul
3 00089     51   Alba
4 00092     68 Martha

Answer 2

@tacoman's works well. @tacoman 的效果很好。 But I couldn't resist including a dplyr version.但我无法抗拒包括 dplyr 版本。 The cross join is doing a similar job to @tacoman's outer() .交叉连接正在做与@tacoman 的outer()类似的工作。

df1 <- data.frame(ID_1 = c("00001", "00008", "00011", "00004"),
                  Owner = c("John", "Paul", "Alba", "Martha"),
                  Remaining_points = c(18, 34, 52, 67))

df2 <- data.frame(ID_2 = c("00025", "00076", "00089", "00092"),
                  Points = c(17, 35, 51, 68))


df1 |> 
  dplyr::full_join(df2, by = character()) |>    # This is essentially a cross join b/c no key is used.
  dplyr::mutate(
    distance  = abs(Points - Remaining_points), # Find the difference in all possibilities
  ) |> 
  dplyr::group_by(ID_2) |>                      # Isolate each ID in its own sub-dataset 
  dplyr::mutate(
    rank      = dplyr::row_number(distance),    # Rank the distances. The closest will be '1'.
  ) |> 
  dplyr::filter(rank == 1L) |>                  # Keep only the closest
  dplyr::ungroup() |> 
  dplyr::select(
    ID_2,
    Points,
    Owner
  )

Result:结果：

# A tibble: 4 x 3
  ID_2  Points Owner 
  <chr>  <dbl> <chr> 
1 00025     17 John  
2 00076     35 Paul  
3 00089     51 Alba  
4 00092     68 Martha

This is the intermediate result (before removing the extra rows and columns):这是中间结果（在删除额外的行和列之前）：

# A tibble: 16 x 7
# Groups:   ID_2 [4]
   ID_1  Owner  Remaining_points ID_2  Points distance  rank
   <chr> <chr>             <dbl> <chr>  <dbl>    <dbl> <int>
 1 00001 John                 18 00025     17        1     1 # <- closest for John
 2 00001 John                 18 00076     35       17     2
 3 00001 John                 18 00089     51       33     4
 4 00001 John                 18 00092     68       50     4
 5 00008 Paul                 34 00025     17       17     2
 6 00008 Paul                 34 00076     35        1     1 # <- closest for Paul
 7 00008 Paul                 34 00089     51       17     3
 8 00008 Paul                 34 00092     68       34     3
 9 00011 Alba                 52 00025     17       35     3
10 00011 Alba                 52 00076     35       17     3
11 00011 Alba                 52 00089     51        1     1 # <- closest for Alba
12 00011 Alba                 52 00092     68       16     2
13 00004 Martha               67 00025     17       50     4
14 00004 Martha               67 00076     35       32     4
15 00004 Martha               67 00089     51       16     2
16 00004 Martha               67 00092     68        1     1 # <- closest for Martha

R - 根据另一个数据集的值在数据集上添加一列

问题描述

2 个解决方案

解决方案1
2 2022-11-11 14:29:37

解决方案2
1 2022-11-14 18:56:53

R - 根据另一个数据集的值在数据集上添加一列

问题描述

2 个解决方案

解决方案1 2 2022-11-11 14:29:37

解决方案2 1 2022-11-14 18:56:53

解决方案1
2 2022-11-11 14:29:37

解决方案2
1 2022-11-14 18:56:53