简体   繁体   English

根据另一个 dataframe 的多个列向 dataframe 添加新列

[英]Add a new column to a dataframe based on multiple columns from another dataframe

I have information from two dataframes: df1 contains information on individuals, and df2 information on the parents of these individuals.我有来自两个数据框的信息:df1 包含有关个人的信息,df2 包含有关这些人父母的信息。

> df1
      ID  Obs   sire    dam
1 313425 Obs1 241600 238895
2 313425 Obs2 241600 238895
3 313425 Obs3 241600 238895
4 313531 Obs2 239742 241447
5 315760 Obs2 238355 236642
6 315760 Obs1 238355 236642

And

> df2
   Animal  Obs Obs_value
1  241600 Obs1      19.9
2  239742 Obs1      19.6
3  238355 Obs1      18.5
4  238895 Obs1      20.1
5  241447 Obs1      22.0
6  236642 Obs1      19.8
7  241600 Obs2       1.9
8  239742 Obs2       1.6
9  238355 Obs2       1.5
10 238895 Obs2       2.1
11 241447 Obs2       2.0
12 236642 Obs2       1.8
13 241600 Obs3         1
14 239742 Obs3         1
15 238355 Obs3         1
16 238895 Obs3         1
17 241447 Obs3         0
18 236642 Obs3         1

I want to add information from df2 into df1, where I want to match columns df1$Obs, df1$sire (or df1$dam) with df2$Animal, df2$Obs and return df2$Obs_value into df1.我想将 df2 中的信息添加到 df1,我想将 df1$Obs、df1$sire(或 df1$dam)列与 df2$Animal、df2$Obs 相匹配,并将 df2$Obs_value 返回到 df1。 Example of desired output:所需 output 的示例:

> df1
      ID  Obs   sire    dam sire_value dam_value
1 313425 Obs1 241600 238895       19.9      20.1
2 313425 Obs2 241600 238895        1.9       1.5
3 313425 Obs3 241600 238895          1         1
4 313531 Obs2 239742 241447        1.6       2.0
5 315760 Obs2 238355 236642        1.5       1.8
6 315760 Obs1 238355 236642          1      19.8

I've tried the following code, but that's not giving the correct results (or not any at all).我试过下面的代码,但没有给出正确的结果(或者根本没有)。

> df1 <- df1 %>% mutate(sire_value = left_join(df1, df2, by.x = c("ID", "Obs"), by.y = c("Animal", "Obs")))
Joining, by = "Obs"
Error: Problem with `mutate()` input `sire_value`.
x Input `sire_value` can't be recycled to size 6.
i Input `sire_value` is `left_join(...)`.
i Input `sire_value` must be size 6 or 1, not 36.
Run `rlang::last_error()` to see where the error occurred.

Can anyone help me with this?谁能帮我这个? Much appreciated!非常感激!

A general/scalable solution would be to get the data in long format, join the data and get it back in wide format:通用/可扩展的解决方案是以长格式获取数据,加入数据并以宽格式取回数据:

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(cols = c(sire, dam)) %>%
  left_join(df2, by = c('Obs', 'value' = 'Animal')) %>%
  pivot_wider(names_from = name, values_from = c(Obs_value, value))

#     ID Obs   Obs_value_sire Obs_value_dam value_sire value_dam
#   <int> <chr>          <dbl>         <dbl>      <int>     <int>
#1 313425 Obs1            19.9          20.1     241600    238895
#2 313425 Obs2             1.9           2.1     241600    238895
#3 313425 Obs3             1             1       241600    238895
#4 313531 Obs2             1.6           2       239742    241447
#5 315760 Obs2             1.5           1.8     238355    236642
#6 315760 Obs1            18.5          19.8     238355    236642

If you only have two columns to join like in this example, you can join them individually.如果像本例中那样只有两列要连接,则可以单独连接它们。

df1 %>%  
  left_join(df2 %>% rename(sire_value = Obs_value), 
                     by = c('Obs', 'sire' = 'Animal')) %>%
  left_join(df1 %>%  left_join(df2 %>% rename(dam_value = Obs_value), 
                     by = c('Obs', 'dam' = 'Animal')))

I could approach your problem using the data.table package. It's quite efficient to work with data frames or data.tables using data.table , so you would probably want to use it later on.我可以使用 data.table package 解决您的问题。使用data.table处理数据框或data.tables非常有效,因此您可能希望稍后使用它。

# setup environment
library('data.table')
library('dplyr')

# add data table class to df1 and df2
setDT(df1); setDT(df2)
# merge data frames on 'sire' column
df1 = df1[df2, on = .(sire = animal, obs = obs), nomatch = 0L]
# rename 'value' column name to 'value_sire'
df1 = df1 %>%
  rename(value_sire = value)
# merge data frames on 'dam' column
df1 = df1[df2, on = .(dam = animal, obs = obs), nomatch = 0L]
# rename new 'value' column to 'value_dam'
df1 = df1 %>%
  rename(value_dam = value) %>%
  arrange(desc(sire))

Let us know if your problem was solved.如果您的问题已解决,请告诉我们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据其他列的结果向数据框添加新列 - Add a new column to a dataframe based on results from other columns 基于多个列在数据框中添加新列 - Adding a new column in a dataframe based on multiple columns 根据另一个数据框中的列在一个数据框中创建新列 - Creating new column in one dataframe based on column from another dataframe 根据列表和数据框向具有多个条件的数据框添加新列 - Add a new column to a dataframe with multiple condition based on list and a dataframe 根据 R 中 dataframe 的另一列的相等值,在新列(在第一个数据帧中)中添加值(来自第二个数据帧) - Add value (from 2nd dataframe) in new column (in 1st dataframe) based on equality value of another column from both dataframe in R 根据来自另一个 dataframe 的值向 dataframe 添加一列 - Add a column to dataframe based on values from another dataframe 将新列添加到来自另一个数据框的长数据框? - Add new column to long dataframe from another dataframe? 基于另一个数据帧替换数据帧中一列的多个值 - Replacing multiple values from a column in a dataframe based on another dataframe 基于另一列在 r dataframe 中添加新的计算列 - Add new calculated column in a r dataframe based on another column 基于R中的单个字符列创建具有多列的新数据框 - Create new dataframe with multiple columns based on single character column in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM