简体   繁体   English

如何将一组值添加到现有数据框中?

[英]How to add a set of values to an existing data frame?

I have a data frame containing three columns: ID, year, growth .我有一个包含三列的数据框: ID、年份、增长 The last one contains data of growth in milimeters for each year.最后一个包含每年以毫米为单位的增长数据。

Example:例子:

df <- data.frame(ID=rep(c("CHC01", "CHC02", "CHC03"), each=4), 
                 year=rep(2015:2018, 3), 
                 growth=c(NA, 2.3, 2.1, 3.0, NA, NA, NA, 3.2, NA, NA, 2.1, 1.2))

In another data frame, I have other three columns: ID, missing_length, missing_years .在另一个数据框中,我还有其他三列: ID、missing_length、missing_years Missing length relates to the estimated length missed in the measurements.缺失长度与测量中缺失的估计长度有关。 Missing years relates to the number of missing years in df缺失年数与df中缺失年数有关

estimate <- data.frame(ID=c("CHC01", "CHC02", "CHC03"), 
                       missing_length=c(1.0, 4.4, 3.5), 
                       missing_years=c(1,3,2))

For calculating the growth for each missing year, I tried:为了计算每个缺失年份的增长,我尝试了:

missing <- rep(estimate$missing_length / estimate$missing_years, estimate$missing_years)

Does anyone have any idea of how to deal with this problem?有谁知道如何处理这个问题?

Thank you very much!非常感谢!

We can do a join and then replace the NA with the calculated value我们可以做一个连接,然后用计算的值replace NA

library(dplyr)
df %>% 
   left_join(estimate) %>% 
   group_by(ID) %>% 
   transmute(year, growth  = replace(growth, is.na(growth), 
                 missing_length[1]/missing_years[1]))
# A tibble: 12 x 3
# Groups:   ID [3]
#   ID     year growth
#   <fct> <int>  <dbl>
# 1 CHC01  2015   1   
# 2 CHC01  2016   2.3 
# 3 CHC01  2017   2.1 
# 4 CHC01  2018   3   
# 5 CHC02  2015   1.47
# 6 CHC02  2016   1.47
# 7 CHC02  2017   1.47
# 8 CHC02  2018   3.2 
# 9 CHC03  2015   1.75
#10 CHC03  2016   1.75
#11 CHC03  2017   2.1 
#12 CHC03  2018   1.2 

Or with coalescecoalesce

df %>%
   mutate(growth = coalesce(growth,  with(estimate, 
        setNames(missing_length/missing_years, ID))[as.character(ID)])) %>%
   as_tibble
# A tibble: 12 x 3
#   ID     year growth
#   <fct> <int>  <dbl>
# 1 CHC01  2015   1   
# 2 CHC01  2016   2.3 
# 3 CHC01  2017   2.1 
# 4 CHC01  2018   3   
# 5 CHC02  2015   1.47
# 6 CHC02  2016   1.47
# 7 CHC02  2017   1.47
# 8 CHC02  2018   3.2 
# 9 CHC03  2015   1.75
#10 CHC03  2016   1.75
#11 CHC03  2017   2.1 
#12 CHC03  2018   1.2 

Or similar option in data.tabledata.table中的类似选项

library(data.table)
setDT(df)[estimate, growth := fcoalesce(growth, 
           missing_length/missing_years), on = .(ID)]

Base R solution.基础 R 解决方案。 Supposing tables "df" and "estimate" are sorted by id (ascending CHC) and we keep your "missing" object, this should work:假设表“df”和“estimate”按 id 排序(升序 CHC)并且我们保留您的“缺失”object,这应该有效:

df$growth=replace(df$growth,which(is.na(df$growth)),missing)

Output: Output:

      ID year   growth
1  CHC01 2015 1.000000
2  CHC01 2016 2.300000
3  CHC01 2017 2.100000
4  CHC01 2018 3.000000
5  CHC02 2015 1.466667
6  CHC02 2016 1.466667
7  CHC02 2017 1.466667
8  CHC02 2018 3.200000
9  CHC03 2015 1.750000
10 CHC03 2016 1.750000
11 CHC03 2017 2.100000
12 CHC03 2018 1.200000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM