简体   繁体   English

在R中具有多个条件的左联接

[英]Left join with multiple conditions in R

I'm trying to replace ids for their respective values. 我正在尝试将ID替换为其各自的值。 The problem is that each id has a different value according to the previous column type , like this: 问题在于,每个id根据先前的列type具有不同的值,如下所示:

>df
  type id 
1  q1   1
2  q1   2
3  q2   1
4  q2   3
5  q3   1
6  q3   2

Here's the type ids with its value: 这是类型ID及其值:

>q1
  id value
1 1  yes
2 2  no

>q2 
   id value
1  1  one hour
2  2  two hours
3  3  more than two hours

>q3
  id value
1 1  blue
2 2  yellow

I've tried something like this: 我已经尝试过这样的事情:

df <- left_join(subset(df, type %in% c("q1"), q1, by = "id"))

But it removes the other values. 但这会删除其他值。

I' like to know how to do a one liner solution (or kind of) because there are more than 20 vectors with types description. 我想知道如何做一个(或一种) one liner solution ,因为有20多个带有类型描述的向量。

Any ideias on how to do it? 关于如何做的任何想法?

This is the df i'm expecting: 这是我期望的df:

>df
  type id value
1  q1   1 yes
2  q1   2 no
3  q2   1 one hour
4  q2   3 more than two hours
5  q3   1 blue
6  q3   2 yellow

You can join on more than one variable. 您可以加入多个变量。 The example df you give would actually make a suitable lookup table for this: 您给出的示例df实际上将为此创建一个合适的查找表:

value_lookup <- data.frame(
  type = c('q1', 'q1', 'q2', 'q2', 'q3', 'q3'),
  id = c(1, 2, 1, 3, 1, 2),
  value = c('yes', 'no', 'one hour', 'more than two hours', 'blue', 'yellow')
)

Then you just merge on both type and id : 然后,您只需合并typeid

df <- left_join(df, value_lookup, by = c('type', 'id'))  

Usually when I need a lookup table like that I store it in a CSV rather than write it all out in the code, but do whatever suits you. 通常,当我需要像这样的查找表时,我会将其存储在CSV中,而不是将其全部写在代码中,但是可以做一些适合您的事情。

tempList = split(df, df$type)
do.call(rbind,
          lapply(names(tempList), function(nm)
              merge(tempList[[nm]], get(nm))))
#  id type               value
#1  1   q1                 yes
#2  2   q1                  no
#3  1   q2            one hour
#4  3   q2 more than two hours
#5  1   q3                blue
#6  2   q3              yellow

Get the values of 'q\\d+' data.frame object identifiers in a list , bind them together into a single data.frame with bind_rows while creating the 'type' column as the identifier name and right_join with the dataset object 'df' 获得的“Q \\ d +”在一个data.frame对象标识符的值list ,一起结合成一个单一的data.frame与bind_rows在创建“类型”列作为标识符名和right_join与数据集对象“DF”

library(tidyverse)
mget(paste0("q", 1:3)) %>% 
    bind_rows(.id = 'type') %>% 
    right_join(df)
#  type id               value
#1   q1  1                 yes
#2   q1  2                  no
#3   q2  1            one hour
#4   q2  3 more than two hours
#5   q3  1                blue
#6   q3  2              yellow

You can do it by a series of left joins: 您可以通过一系列左联接来做到这一点:

df1 = left_join(df, q1, by='id') %>% filter(type=="q1")
> df1
  type id value
1   q1  1   yes
2   q1  2    no


df2 = left_join(df, q2, by='id') %>% filter(type=="q2")
> df2
  type id               value
1   q2  1            one hour
2   q2  3 more than two hours

df3 = left_join(df, q3, by='id') %>% filter(type=="q3")
> df3
  type id  value
1   q3  1   blue
2   q3  2 yellow

> rbind(df1,df2,df3)
  type id               value
1   q1  1                 yes
2   q1  2                  no
3   q2  1            one hour
4   q2  3 more than two hours
5   q3  1                blue
6   q3  2              yellow

One liner would be: 一种班轮是:

rbind(left_join(df, q1, by='id') %>% filter(type=="q1"),
        left_join(df, q2, by='id') %>% filter(type=="q2"),
            left_join(df, q3, by='id') %>% filter(type=="q3")) 

If you have more vectors then probably you should loop through the names of vector types and execute left_join and bind_rows one by one as: 如果您有更多的向量,则可能应该遍历向量类型的名称,并按如下方式逐一执行left_join和bind_rows:

vecQs = c(paste("q", seq(1,3,1),sep="")) #Types of variables q1, q2 ...
result = tibble()

#Execute left_join for the types and store it in result.
for(i in vecQs) {       
     result = bind_rows(result, left_join(df,eval(as.symbol(i)) , by='id') %>% filter(type==!!i))
}

This will give: 这将给出:

> result
# A tibble: 6 x 3
  type     id value              
  <chr> <int> <chr>              
1 q1        1 yes                
2 q1        2 no                 
3 q2        1 one hour           
4 q2        3 more than two hours
5 q3        1 blue               
6 q3        2 yellow

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM