简体   繁体   English

将数据集与另一个数据集匹配,并使用R指定相应的值

[英]Matching a dataset with another dataset and assigning the respective values using R

Consider the below provided dataset (D1); 考虑下面提供的数据集(D1);

------------------
value_1 | value_2
------------------
  0.05  |   0.56
  0.10  |   0.78
  0.80  |   0.98
  0.45  |   1.50
  0.06  |   2.79
------------------

I need to match the above dataset with the dataset (D2) provided below; 我需要将上面的数据集与下面提供的数据集(D2)相匹配;

-----------------------------------------------
range_v1 | sd_value_v1 | range_v2 | sd_value_v2
-----------------------------------------------
   0.2   |     1       |   0.50   |     1
   0.4   |     2       |   0.75   |     2
   0.6   |     3       |   0.90   |     3
   0.8   |     4       |   1.50   |     4
   1.0   |     5       |   3.0    |     5
------------------------------------------------

I need to match my D1 with D2 and assign the 'sd_value_v1', 'sd_value_v2' accordingly with value_1 and value_2. 我需要将我的D1与D2匹配,然后使用value_1和value_2分配'sd_value_v1','sd_value_v2'。

What D2 specifies is that, if the value of v1 is less than or equal to 0.2, then the sd_value_v1 (1) is assigned to value_1. D2指定的是,如果v1的值小于或等于0.2,则将sd_value_v1(1)分配给value_1。 Similarly if the value is less than 0.4 and more than 0.2 then the sd_value_v1 of (2) is assigned to the respective value of value_1. 类似地,如果该值小于0.4且大于0.2,则将(2)的sd_value_v1分配给value_1的相应值。

Example: 例:

value_1 = 0.10 value_1 = 0.10

Then on matching with D2, I should get the sd_value_v1 of 5. 然后在与D2匹配时,我应该得到sd_value_v1为5。

Sample Ranges (both v1 and v2): 样本范围(v1和v2):

0 to 0.2 --> 1 0到0.2 - > 1

0.21 to 0.4 --> 2 0.21至0.4 - > 2

0.41 to 0.6 --> 3 0.41至0.6 - > 3

0.61 to 0.8 --> 4 0.61到0.8 - > 4

0.81 to 1.0 --> 5 0.81到1.0 - > 5

Expected Output: 预期产出:

---------------------------------------------
value_1 | sd_value_v1 | value_2 | sd_value_v2
---------------------------------------------
  0.05  |      1      |   0.56  |     2
  0.10  |      1      |   0.78  |     3
  0.80  |      4      |   0.98  |     4
  0.45  |      3      |   1.50  |     4
  0.06  |      1      |   2.79  |     4
---------------------------------------------

I am currently using 'R' to solve this problem. 我目前正在使用'R'来解决这个问题。 Any inputs will be really helpful. 任何输入都会非常有用。

In base R, we could use mapply with cut using breaks from range.. columns and labels from sd.. columns to get the sd_value . 在基础R,我们可以使用mapplycutbreaksrange..列和labelssd..列得到sd_value

df1[paste0("sd_value", seq_len(ncol(df1)))] <- 
      mapply(function(x, y, z) cut(x, breaks = c(-Inf, y), labels = z), 
      df1, df2[c(TRUE, FALSE)], df2[c(FALSE, TRUE)])

df1
#  value_1 value_2 sd_value1 sd_value2
#1    0.05    0.56         1         2
#2    0.10    0.78         1         3
#3    0.80    0.98         4         4
#4    0.45    1.50         3         4
#5    0.06    2.79         1         5

Selection of columns can vary based on how columns are assigned in your actual df2 . 列的选择可能会根据实际df2列分配方式而有所不同。 In the example shown range.. and sd_value.. columns are alternately arranged hence I used df2[c(TRUE, FALSE)] and df2[c(FALSE, TRUE)] to select the column alternately. 在示例中, range..sd_value..列交替排列,因此我使用df2[c(TRUE, FALSE)]df2[c(FALSE, TRUE)]来交替选择列。 If that is not the case in reality you can use grep to get the column index based on their name 如果实际情况并非如此,您可以使用grep根据其名称获取列索引

range_cols <- grep("^range", names(df2))
sd_cols <- grep("^sd", names(df2))

and then use it in mapply like 然后在mapply使用它

df1[paste0("sd_value", seq_len(ncol(df1)))] <- 
          mapply(function(x, y, z) cut(x, breaks = c(-Inf, y), labels = z), 
          df1, df2[range_cols], df2[sd_cols])

Here is a method from tidyverse 这是来自tidyverse的方法

library(tidyverse)
list(df1, df2[c(1, 3)], df2[c(2, 4)])  %>% 
   pmap(~  ..3[findInterval(..1, ..2, left.open = TRUE)+1]) %>%
   set_names(str_c("sd_value", seq_along(.))) %>%
     bind_cols(df1, .)
#   value_1 value_2 sd_value1 sd_value2
#1    0.05    0.56         1         2
#2    0.10    0.78         1         3
#3    0.80    0.98         4         4
#4    0.45    1.50         3         4
#5    0.06    2.79         1         5

data 数据

df1 <- structure(list(value_1 = c(0.05, 0.1, 0.8, 0.45, 0.06), value_2 = c(0.56, 
0.78, 0.98, 1.5, 2.79)), class = "data.frame", row.names = c(NA, 
-5L))

df2 <- structure(list(range_v1 = c(0.2, 0.4, 0.6, 0.8, 1), sd_value_v1 = 1:5, 
    range_v2 = c(0.5, 0.75, 0.9, 1.5, 3), sd_value_v2 = 1:5), 
    class = "data.frame", row.names = c(NA, 
-5L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM