[英]Add a new column to a dataframe based on multiple columns from another dataframe
I have information from two dataframes: df1 contains information on individuals, and df2 information on the parents of these individuals.我有来自两个数据框的信息:df1 包含有关个人的信息,df2 包含有关这些人父母的信息。
> df1
ID Obs sire dam
1 313425 Obs1 241600 238895
2 313425 Obs2 241600 238895
3 313425 Obs3 241600 238895
4 313531 Obs2 239742 241447
5 315760 Obs2 238355 236642
6 315760 Obs1 238355 236642
And和
> df2
Animal Obs Obs_value
1 241600 Obs1 19.9
2 239742 Obs1 19.6
3 238355 Obs1 18.5
4 238895 Obs1 20.1
5 241447 Obs1 22.0
6 236642 Obs1 19.8
7 241600 Obs2 1.9
8 239742 Obs2 1.6
9 238355 Obs2 1.5
10 238895 Obs2 2.1
11 241447 Obs2 2.0
12 236642 Obs2 1.8
13 241600 Obs3 1
14 239742 Obs3 1
15 238355 Obs3 1
16 238895 Obs3 1
17 241447 Obs3 0
18 236642 Obs3 1
I want to add information from df2 into df1, where I want to match columns df1$Obs, df1$sire (or df1$dam) with df2$Animal, df2$Obs and return df2$Obs_value into df1.我想将 df2 中的信息添加到 df1,我想将 df1$Obs、df1$sire(或 df1$dam)列与 df2$Animal、df2$Obs 相匹配,并将 df2$Obs_value 返回到 df1。 Example of desired output:
所需 output 的示例:
> df1
ID Obs sire dam sire_value dam_value
1 313425 Obs1 241600 238895 19.9 20.1
2 313425 Obs2 241600 238895 1.9 1.5
3 313425 Obs3 241600 238895 1 1
4 313531 Obs2 239742 241447 1.6 2.0
5 315760 Obs2 238355 236642 1.5 1.8
6 315760 Obs1 238355 236642 1 19.8
I've tried the following code, but that's not giving the correct results (or not any at all).我试过下面的代码,但没有给出正确的结果(或者根本没有)。
> df1 <- df1 %>% mutate(sire_value = left_join(df1, df2, by.x = c("ID", "Obs"), by.y = c("Animal", "Obs")))
Joining, by = "Obs"
Error: Problem with `mutate()` input `sire_value`.
x Input `sire_value` can't be recycled to size 6.
i Input `sire_value` is `left_join(...)`.
i Input `sire_value` must be size 6 or 1, not 36.
Run `rlang::last_error()` to see where the error occurred.
Can anyone help me with this?谁能帮我这个? Much appreciated!
非常感激!
A general/scalable solution would be to get the data in long format, join the data and get it back in wide format:通用/可扩展的解决方案是以长格式获取数据,加入数据并以宽格式取回数据:
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = c(sire, dam)) %>%
left_join(df2, by = c('Obs', 'value' = 'Animal')) %>%
pivot_wider(names_from = name, values_from = c(Obs_value, value))
# ID Obs Obs_value_sire Obs_value_dam value_sire value_dam
# <int> <chr> <dbl> <dbl> <int> <int>
#1 313425 Obs1 19.9 20.1 241600 238895
#2 313425 Obs2 1.9 2.1 241600 238895
#3 313425 Obs3 1 1 241600 238895
#4 313531 Obs2 1.6 2 239742 241447
#5 315760 Obs2 1.5 1.8 238355 236642
#6 315760 Obs1 18.5 19.8 238355 236642
If you only have two columns to join like in this example, you can join them individually.如果像本例中那样只有两列要连接,则可以单独连接它们。
df1 %>%
left_join(df2 %>% rename(sire_value = Obs_value),
by = c('Obs', 'sire' = 'Animal')) %>%
left_join(df1 %>% left_join(df2 %>% rename(dam_value = Obs_value),
by = c('Obs', 'dam' = 'Animal')))
I could approach your problem using the data.table package. It's quite efficient to work with data frames or data.tables using data.table , so you would probably want to use it later on.我可以使用 data.table package 解决您的问题。使用data.table处理数据框或data.tables非常有效,因此您可能希望稍后使用它。
# setup environment
library('data.table')
library('dplyr')
# add data table class to df1 and df2
setDT(df1); setDT(df2)
# merge data frames on 'sire' column
df1 = df1[df2, on = .(sire = animal, obs = obs), nomatch = 0L]
# rename 'value' column name to 'value_sire'
df1 = df1 %>%
rename(value_sire = value)
# merge data frames on 'dam' column
df1 = df1[df2, on = .(dam = animal, obs = obs), nomatch = 0L]
# rename new 'value' column to 'value_dam'
df1 = df1 %>%
rename(value_dam = value) %>%
arrange(desc(sire))
Let us know if your problem was solved.如果您的问题已解决,请告诉我们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.