[英]Multiplying data from two columns and adding it to an existing 3rd
I have a data Frame with two columns containing plant leaf lenght and width data.我有一个包含植物叶子长度和宽度数据的两列数据框。
For the year 2019 I have a mix of data.对于 2019 年,我有各种数据。 Some Data points only have length and area measurements.一些数据点只有长度和面积测量值。 Some other data points have all three measurements.其他一些数据点具有所有三个测量值。
With this data I was able to calculate a conversion factor.有了这些数据,我就能够计算出一个转换因子。 In 2020 I only have the length and width measurements. 2020年我只有长度和宽度的测量值。 With the conversion factor I want to calculate the area for the year 2020 and add it to the leaf area column without overwriting any of the area measurements in 2019使用转换因子,我想计算 2020 年的面积并将其添加到叶面积列而不覆盖 2019 年的任何面积测量值
df_all <- df_all%>% mutate(rep_leaf_length*rep_leaf_width * 0.790590)
This was my first starting point before I realized I do not know how to get where I want.在我意识到我不知道如何到达我想要的地方之前,这是我的第一个起点。
Do you guys have an Idea how to do the multiplication and add the result to the existing column but only for 2020 or if NA's are in the area column and not already existing area measurements.你们有一个想法如何进行乘法并将结果添加到现有列,但仅适用于 2020 年,或者如果 NA 在区域列中而不是现有的区域测量值。
Year rep_leaf_length rep_leaf_width rep_leaf_area
2019 37.400 NA NA
2019 21.036 8.080 132.914
2019 29.147 2.331 NA
2020 16.600 4.00 NA
2020 21.600 2.2 NA
Thanks a lot Jan非常感谢简
I think you mean that you want to infer the (unmeasured) leaf area from 2020, using the (measured) leaf length and leaf width from that year.我认为您的意思是您想使用当年的(测量的)叶长和叶宽来推断 2020 年的(未测量的)叶面积。 However, the leaf area isn't a simple product of width and length, since leaves aren't rectangular.然而,叶子面积不是宽度和长度的简单乘积,因为叶子不是矩形的。 Fortunately, you have some observations from 2019 where length, width and area were all measured.幸运的是,您有一些 2019 年的观察结果,其中长度、宽度和面积都被测量了。 That means if you compare the length * width to the actual area for the complete 2019 observations, you will get a ratio of actual area to (length * width).这意味着,如果您将长度 * 宽度与 2019 年完整观测的实际面积进行比较,您将得到实际面积与(长度 * 宽度)的比率。 Since the leaves are presumably of relatively fixed shape, this ratio can be used to multiply the (length * width) values from 2020 to give an estimated area.由于叶子可能具有相对固定的形状,因此该比率可用于将 2020 年的(长度 * 宽度)值相乘以得出估计面积。
Assuming I have interpreted your intentions correctly, we can work out the ratio of actual area to (width * length) in 2019 like this:假设我已经正确解释了您的意图,我们可以计算出 2019 年实际面积与(宽度 * 长度)的比率,如下所示:
library(dplyr)
ratio <- df_all %>%
filter(Year == 2019) %>%
filter(complete.cases(.)) %>%
summarize(ratio = mean(rep_leaf_area / (rep_leaf_length * rep_leaf_width))) %>%
unlist()
ratio
#> ratio
#> 0.781981
And we can use the ratio like this:我们可以像这样使用比率:
df_all %>%
mutate(rep_leaf_area = ifelse(Year == 2020,
rep_leaf_length * rep_leaf_width * ratio,
rep_leaf_area))
#> Year rep_leaf_length rep_leaf_width rep_leaf_area
#> 1 2019 37.400 NA NA
#> 2 2019 21.036 8.080 132.91400
#> 3 2019 29.147 2.331 NA
#> 4 2020 16.600 4.000 51.92354
#> 5 2020 21.600 2.200 37.15974
Note that this does not affect 2019's area measurements.请注意,这不会影响 2019 年的面积测量。
Data数据
df_all <- structure(list(Year = c(2019L, 2019L, 2019L, 2020L, 2020L),
rep_leaf_length = c(37.4, 21.036, 29.147, 16.6, 21.6),
rep_leaf_width = c(NA, 8.08, 2.331, 4, 2.2),
rep_leaf_area = c(NA, 132.914, NA, NA, NA)),
class = "data.frame", row.names = c(NA, -5L))
Use an index.使用索引。 Calculate areas of indexed rows and assign them.计算索引行的区域并分配它们。
index <- dat$Year %in% 2020
areas <- apply(dat[index, 2:3], 1, prod)
dat[index, 4] <- areas
# Year rep_leaf_length rep_leaf_width rep_leaf_area
# 1 2019 37.400 NA NA
# 2 2019 21.036 8.080 132.914
# 3 2019 29.147 2.331 NA
# 4 2020 16.600 4.000 66.400
# 5 2020 21.600 2.200 47.520
Data:数据:
dat <- structure(list(Year = c(2019L, 2019L, 2019L, 2020L, 2020L), rep_leaf_length = c(37.4,
21.036, 29.147, 16.6, 21.6), rep_leaf_width = c(NA, 8.08, 2.331,
4, 2.2), rep_leaf_area = c(NA, 132.914, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-5L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.