[英]How to add a set of values to an existing data frame?
I have a data frame containing three columns: ID, year, growth .我有一个包含三列的数据框: ID、年份、增长。 The last one contains data of growth in milimeters for each year.
最后一个包含每年以毫米为单位的增长数据。
Example:例子:
df <- data.frame(ID=rep(c("CHC01", "CHC02", "CHC03"), each=4),
year=rep(2015:2018, 3),
growth=c(NA, 2.3, 2.1, 3.0, NA, NA, NA, 3.2, NA, NA, 2.1, 1.2))
In another data frame, I have other three columns: ID, missing_length, missing_years .在另一个数据框中,我还有其他三列: ID、missing_length、missing_years 。 Missing length relates to the estimated length missed in the measurements.
缺失长度与测量中缺失的估计长度有关。 Missing years relates to the number of missing years in df
缺失年数与df中缺失年数有关
estimate <- data.frame(ID=c("CHC01", "CHC02", "CHC03"),
missing_length=c(1.0, 4.4, 3.5),
missing_years=c(1,3,2))
For calculating the growth for each missing year, I tried:为了计算每个缺失年份的增长,我尝试了:
missing <- rep(estimate$missing_length / estimate$missing_years, estimate$missing_years)
Does anyone have any idea of how to deal with this problem?有谁知道如何处理这个问题?
Thank you very much!非常感谢!
We can do a join and then replace
the NA
with the calculated value我们可以做一个连接,然后用计算的值
replace
NA
library(dplyr)
df %>%
left_join(estimate) %>%
group_by(ID) %>%
transmute(year, growth = replace(growth, is.na(growth),
missing_length[1]/missing_years[1]))
# A tibble: 12 x 3
# Groups: ID [3]
# ID year growth
# <fct> <int> <dbl>
# 1 CHC01 2015 1
# 2 CHC01 2016 2.3
# 3 CHC01 2017 2.1
# 4 CHC01 2018 3
# 5 CHC02 2015 1.47
# 6 CHC02 2016 1.47
# 7 CHC02 2017 1.47
# 8 CHC02 2018 3.2
# 9 CHC03 2015 1.75
#10 CHC03 2016 1.75
#11 CHC03 2017 2.1
#12 CHC03 2018 1.2
Or with coalesce
或
coalesce
df %>%
mutate(growth = coalesce(growth, with(estimate,
setNames(missing_length/missing_years, ID))[as.character(ID)])) %>%
as_tibble
# A tibble: 12 x 3
# ID year growth
# <fct> <int> <dbl>
# 1 CHC01 2015 1
# 2 CHC01 2016 2.3
# 3 CHC01 2017 2.1
# 4 CHC01 2018 3
# 5 CHC02 2015 1.47
# 6 CHC02 2016 1.47
# 7 CHC02 2017 1.47
# 8 CHC02 2018 3.2
# 9 CHC03 2015 1.75
#10 CHC03 2016 1.75
#11 CHC03 2017 2.1
#12 CHC03 2018 1.2
Or similar option in data.table
或
data.table
中的类似选项
library(data.table)
setDT(df)[estimate, growth := fcoalesce(growth,
missing_length/missing_years), on = .(ID)]
Base R solution.基础 R 解决方案。 Supposing tables "df" and "estimate" are sorted by id (ascending CHC) and we keep your "missing" object, this should work:
假设表“df”和“estimate”按 id 排序(升序 CHC)并且我们保留您的“缺失”object,这应该有效:
df$growth=replace(df$growth,which(is.na(df$growth)),missing)
Output: Output:
ID year growth
1 CHC01 2015 1.000000
2 CHC01 2016 2.300000
3 CHC01 2017 2.100000
4 CHC01 2018 3.000000
5 CHC02 2015 1.466667
6 CHC02 2016 1.466667
7 CHC02 2017 1.466667
8 CHC02 2018 3.200000
9 CHC03 2015 1.750000
10 CHC03 2016 1.750000
11 CHC03 2017 2.100000
12 CHC03 2018 1.200000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.