简体   繁体   English

R - 根据另一个数据集的值在数据集上添加一列

[英]R - Add a column on a dataset based on values from another dataset

I have a dataset (df1) with a column that contains Remaining_points for each owner我有一个数据集 (df1),其中有一列包含每个所有者的 Remaining_points

Df1: DF1:

Id      Owner   Remaining_points
00001   John    18
00008   Paul    34
00011   Alba    52
00004   Martha  67

And another one with different id's that contains points Df2另一个具有不同 id 的包含点 Df2

Id      Points
00025   17
00076   35
00089   51
00092   68

I need to add to df2 a Owner column with most similar Remaining_points on df1我需要向 df2 添加一个 Owner 列,该列与 df1 上的 Remaining_points 最相似

Desired dataframe:所需 dataframe:

Id      Points  Owner
00025   17      John
00076   35      Paul
00089   51      Alba
00092   68      Martha

Please, could anyone help me on this?拜托,有人可以帮我吗? I'm used to work with dplyr but any solution would be very appreciated.我曾经与 dplyr 一起工作,但任何解决方案都将不胜感激。

df1 <- data.frame(ID = c("00001", "00008", "00011", "00004"),
                  Owner = c("John", "Paul", "Alba", "Martha"),
                  Remaining_points = c(18, 34, 52, 67))

df2 <- data.frame(ID = c("00025", "00076", "00089", "00092"),
                  Points = c(17, 35, 51, 68))

ind <- which(apply(abs(outer(df1$Remaining_points,df2$Points, "-")), 2, function(x) x == min(x)), arr.ind = TRUE)
df2$Owner <- df1$Owner[ind[,1]]
df2
     ID Points  Owner
1 00025     17   John
2 00076     35   Paul
3 00089     51   Alba
4 00092     68 Martha

@tacoman's works well. @tacoman 的效果很好。 But I couldn't resist including a dplyr version.但我无法抗拒包括 dplyr 版本。 The cross join is doing a similar job to @tacoman's outer() .交叉连接正在做与@tacoman 的outer()类似的工作。

df1 <- data.frame(ID_1 = c("00001", "00008", "00011", "00004"),
                  Owner = c("John", "Paul", "Alba", "Martha"),
                  Remaining_points = c(18, 34, 52, 67))

df2 <- data.frame(ID_2 = c("00025", "00076", "00089", "00092"),
                  Points = c(17, 35, 51, 68))


df1 |> 
  dplyr::full_join(df2, by = character()) |>    # This is essentially a cross join b/c no key is used.
  dplyr::mutate(
    distance  = abs(Points - Remaining_points), # Find the difference in all possibilities
  ) |> 
  dplyr::group_by(ID_2) |>                      # Isolate each ID in its own sub-dataset 
  dplyr::mutate(
    rank      = dplyr::row_number(distance),    # Rank the distances. The closest will be '1'.
  ) |> 
  dplyr::filter(rank == 1L) |>                  # Keep only the closest
  dplyr::ungroup() |> 
  dplyr::select(
    ID_2,
    Points,
    Owner
  )

Result:结果:

# A tibble: 4 x 3
  ID_2  Points Owner 
  <chr>  <dbl> <chr> 
1 00025     17 John  
2 00076     35 Paul  
3 00089     51 Alba  
4 00092     68 Martha

This is the intermediate result (before removing the extra rows and columns):这是中间结果(在删除额外的行和列之前):

# A tibble: 16 x 7
# Groups:   ID_2 [4]
   ID_1  Owner  Remaining_points ID_2  Points distance  rank
   <chr> <chr>             <dbl> <chr>  <dbl>    <dbl> <int>
 1 00001 John                 18 00025     17        1     1 # <- closest for John
 2 00001 John                 18 00076     35       17     2
 3 00001 John                 18 00089     51       33     4
 4 00001 John                 18 00092     68       50     4
 5 00008 Paul                 34 00025     17       17     2
 6 00008 Paul                 34 00076     35        1     1 # <- closest for Paul
 7 00008 Paul                 34 00089     51       17     3
 8 00008 Paul                 34 00092     68       34     3
 9 00011 Alba                 52 00025     17       35     3
10 00011 Alba                 52 00076     35       17     3
11 00011 Alba                 52 00089     51        1     1 # <- closest for Alba
12 00011 Alba                 52 00092     68       16     2
13 00004 Martha               67 00025     17       50     4
14 00004 Martha               67 00076     35       32     4
15 00004 Martha               67 00089     51       16     2
16 00004 Martha               67 00092     68        1     1 # <- closest for Martha

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将数据集中的列添加到R中的另一个数据集中 - add a column from a dataset to another dataset in R R 中,如何向数据集添加一列,该数据集从一列中添加值并从另一列中减去值? - How to add a column to a dataset which adds values from one column and subtracts values from another column in R? R 如何根据其他 2 列将另一列添加到数据集 - R how to add another column to a dataset based on 2 other columns R根据来自另一个数据集的间隔填充新列(查找) - R fill new column based on interval from another dataset (lookup) 将数据集中的值与R中另一个数据集中的值相乘 - Multiply values in a dataset by values in another dataset in R R使用其他数据集中的值重命名列 - R rename column with a value from another dataset 根据数据集在数据集中添加新列 - Add new column in a dataset based on dataset 基于另一个在 dataframe 中创建新列,并与 R 中的另一个数据集匹配 - Create new column in dataframe based on another and matching to another dataset in R 两个数据集:如何检查一个数据集的一列的值是否包含在 R 中另一个数据集的另一列中? - Two datasets: How to check if the values of a column of a dataset are contained in another column of another dataset in R? 如何根据数据集中的另一列将数据集中的列分为三组(三分位数)? 使用 R - How to divide column in dataset into three groups (tertiles) based on another column in the dataset? Using R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM