将数据集与另一个数据集匹配，并使用R指定相应的值

Question

Consider the below provided dataset (D1); 考虑下面提供的数据集（D1）;

------------------
value_1 | value_2
------------------
  0.05  |   0.56
  0.10  |   0.78
  0.80  |   0.98
  0.45  |   1.50
  0.06  |   2.79
------------------

I need to match the above dataset with the dataset (D2) provided below; 我需要将上面的数据集与下面提供的数据集（D2）相匹配;

-----------------------------------------------
range_v1 | sd_value_v1 | range_v2 | sd_value_v2
-----------------------------------------------
   0.2   |     1       |   0.50   |     1
   0.4   |     2       |   0.75   |     2
   0.6   |     3       |   0.90   |     3
   0.8   |     4       |   1.50   |     4
   1.0   |     5       |   3.0    |     5
------------------------------------------------

I need to match my D1 with D2 and assign the 'sd_value_v1', 'sd_value_v2' accordingly with value_1 and value_2. 我需要将我的D1与D2匹配，然后使用value_1和value_2分配'sd_value_v1'，'sd_value_v2'。

What D2 specifies is that, if the value of v1 is less than or equal to 0.2, then the sd_value_v1 (1) is assigned to value_1. D2指定的是，如果v1的值小于或等于0.2，则将sd_value_v1（1）分配给value_1。 Similarly if the value is less than 0.4 and more than 0.2 then the sd_value_v1 of (2) is assigned to the respective value of value_1. 类似地，如果该值小于0.4且大于0.2，则将（2）的sd_value_v1分配给value_1的相应值。

Example: 例：

value_1 = 0.10 value_1 = 0.10

Then on matching with D2, I should get the sd_value_v1 of 5. 然后在与D2匹配时，我应该得到sd_value_v1为5。

Sample Ranges (both v1 and v2): 样本范围（v1和v2）：

0 to 0.2 --> 1 0到0.2 - > 1

0.21 to 0.4 --> 2 0.21至0.4 - > 2

0.41 to 0.6 --> 3 0.41至0.6 - > 3

0.61 to 0.8 --> 4 0.61到0.8 - > 4

0.81 to 1.0 --> 5 0.81到1.0 - > 5

Expected Output: 预期产出：

---------------------------------------------
value_1 | sd_value_v1 | value_2 | sd_value_v2
---------------------------------------------
  0.05  |      1      |   0.56  |     2
  0.10  |      1      |   0.78  |     3
  0.80  |      4      |   0.98  |     4
  0.45  |      3      |   1.50  |     4
  0.06  |      1      |   2.79  |     4
---------------------------------------------

I am currently using 'R' to solve this problem. 我目前正在使用'R'来解决这个问题。 Any inputs will be really helpful. 任何输入都会非常有用。

Answer 1

In base R, we could use mapply with cut using breaks from range.. columns and labels from sd.. columns to get the sd_value . 在基础R，我们可以使用mapply与cut用breaks的range..列和labels从sd..列得到sd_value 。

df1[paste0("sd_value", seq_len(ncol(df1)))] <- 
      mapply(function(x, y, z) cut(x, breaks = c(-Inf, y), labels = z), 
      df1, df2[c(TRUE, FALSE)], df2[c(FALSE, TRUE)])

df1
#  value_1 value_2 sd_value1 sd_value2
#1    0.05    0.56         1         2
#2    0.10    0.78         1         3
#3    0.80    0.98         4         4
#4    0.45    1.50         3         4
#5    0.06    2.79         1         5

Selection of columns can vary based on how columns are assigned in your actual df2 . 列的选择可能会根据实际df2列分配方式而有所不同。 In the example shown range.. and sd_value.. columns are alternately arranged hence I used df2[c(TRUE, FALSE)] and df2[c(FALSE, TRUE)] to select the column alternately. 在示例中， range..和sd_value..列交替排列，因此我使用df2[c(TRUE, FALSE)]和df2[c(FALSE, TRUE)]来交替选择列。 If that is not the case in reality you can use grep to get the column index based on their name 如果实际情况并非如此，您可以使用grep根据其名称获取列索引

range_cols <- grep("^range", names(df2))
sd_cols <- grep("^sd", names(df2))

and then use it in mapply like 然后在mapply使用它

df1[paste0("sd_value", seq_len(ncol(df1)))] <- 
          mapply(function(x, y, z) cut(x, breaks = c(-Inf, y), labels = z), 
          df1, df2[range_cols], df2[sd_cols])

Answer 2

Here is a method from tidyverse 这是来自tidyverse的方法

library(tidyverse)
list(df1, df2[c(1, 3)], df2[c(2, 4)])  %>% 
   pmap(~  ..3[findInterval(..1, ..2, left.open = TRUE)+1]) %>%
   set_names(str_c("sd_value", seq_along(.))) %>%
     bind_cols(df1, .)
#   value_1 value_2 sd_value1 sd_value2
#1    0.05    0.56         1         2
#2    0.10    0.78         1         3
#3    0.80    0.98         4         4
#4    0.45    1.50         3         4
#5    0.06    2.79         1         5

data 数据

df1 <- structure(list(value_1 = c(0.05, 0.1, 0.8, 0.45, 0.06), value_2 = c(0.56, 
0.78, 0.98, 1.5, 2.79)), class = "data.frame", row.names = c(NA, 
-5L))

df2 <- structure(list(range_v1 = c(0.2, 0.4, 0.6, 0.8, 1), sd_value_v1 = 1:5, 
    range_v2 = c(0.5, 0.75, 0.9, 1.5, 3), sd_value_v2 = 1:5), 
    class = "data.frame", row.names = c(NA, 
-5L))

将数据集与另一个数据集匹配，并使用R指定相应的值

问题描述

2 个解决方案

解决方案1
0 2019-05-15 04:28:20

解决方案2
0 2019-05-15 04:41:42

data 数据

将数据集与另一个数据集匹配，并使用R指定相应的值

问题描述

2 个解决方案

解决方案1 0 2019-05-15 04:28:20

解决方案2 0 2019-05-15 04:41:42

data 数据

解决方案1
0 2019-05-15 04:28:20

解决方案2
0 2019-05-15 04:41:42