简体   繁体   English

将两列中的数据相乘并将其添加到现有的第三列

[英]Multiplying data from two columns and adding it to an existing 3rd

I have a data Frame with two columns containing plant leaf lenght and width data.我有一个包含植物叶子长度和宽度数据的两列数据框。

For the year 2019 I have a mix of data.对于 2019 年,我有各种数据。 Some Data points only have length and area measurements.一些数据点只有长度和面积测量值。 Some other data points have all three measurements.其他一些数据点具有所有三个测量值。

With this data I was able to calculate a conversion factor.有了这些数据,我就能够计算出一个转换因子。 In 2020 I only have the length and width measurements. 2020年我只有长度和宽度的测量值。 With the conversion factor I want to calculate the area for the year 2020 and add it to the leaf area column without overwriting any of the area measurements in 2019使用转换因子,我想计算 2020 年的面积并将其添加到叶面积列而不覆盖 2019 年的任何面积测量值

df_all <- df_all%>% mutate(rep_leaf_length*rep_leaf_width * 0.790590)

This was my first starting point before I realized I do not know how to get where I want.在我意识到我不知道如何到达我想要的地方之前,这是我的第一个起点。

Do you guys have an Idea how to do the multiplication and add the result to the existing column but only for 2020 or if NA's are in the area column and not already existing area measurements.你们有一个想法如何进行乘法并将结果添加到现有列,但仅适用于 2020 年,或者如果 NA 在区域列中而不是现有的区域测量值。

Year  rep_leaf_length   rep_leaf_width  rep_leaf_area
2019           37.400               NA             NA
2019           21.036            8.080        132.914
2019           29.147            2.331             NA
2020           16.600             4.00             NA
2020           21.600              2.2             NA

Thanks a lot Jan非常感谢简

I think you mean that you want to infer the (unmeasured) leaf area from 2020, using the (measured) leaf length and leaf width from that year.认为您的意思是您想使用当年的(测量的)叶长和叶宽来推断 2020 年的(未测量的)叶面积。 However, the leaf area isn't a simple product of width and length, since leaves aren't rectangular.然而,叶子面积不是宽度和长度的简单乘积,因为叶子不是矩形的。 Fortunately, you have some observations from 2019 where length, width and area were all measured.幸运的是,您有一些 2019 年的观察结果,其中长度、宽度和面积都被测量了。 That means if you compare the length * width to the actual area for the complete 2019 observations, you will get a ratio of actual area to (length * width).这意味着,如果您将长度 * 宽度与 2019 年完整观测的实际面积进行比较,您将得到实际面积与(长度 * 宽度)的比率。 Since the leaves are presumably of relatively fixed shape, this ratio can be used to multiply the (length * width) values from 2020 to give an estimated area.由于叶子可能具有相对固定的形状,因此该比率可用于将 2020 年的(长度 * 宽度)值相乘以得出估计面积。

Assuming I have interpreted your intentions correctly, we can work out the ratio of actual area to (width * length) in 2019 like this:假设我已经正确解释了您的意图,我们可以计算出 2019 年实际面积与(宽度 * 长度)的比率,如下所示:

library(dplyr)

ratio <- df_all %>% 
  filter(Year == 2019) %>%
  filter(complete.cases(.)) %>%
  summarize(ratio = mean(rep_leaf_area / (rep_leaf_length * rep_leaf_width))) %>%
  unlist()

ratio
#>    ratio 
#> 0.781981

And we can use the ratio like this:我们可以像这样使用比率:

df_all %>% 
  mutate(rep_leaf_area = ifelse(Year == 2020,
                                rep_leaf_length * rep_leaf_width * ratio,
                                rep_leaf_area))
#>   Year rep_leaf_length rep_leaf_width rep_leaf_area
#> 1 2019          37.400             NA            NA
#> 2 2019          21.036          8.080     132.91400
#> 3 2019          29.147          2.331            NA
#> 4 2020          16.600          4.000      51.92354
#> 5 2020          21.600          2.200      37.15974

Note that this does not affect 2019's area measurements.请注意,这不会影响 2019 年的面积测量。


Data数据

df_all <- structure(list(Year = c(2019L, 2019L, 2019L, 2020L, 2020L), 
           rep_leaf_length = c(37.4, 21.036, 29.147, 16.6, 21.6), 
           rep_leaf_width = c(NA, 8.08, 2.331, 4, 2.2), 
           rep_leaf_area = c(NA, 132.914, NA, NA, NA)), 
           class = "data.frame", row.names = c(NA, -5L))

Use an index.使用索引。 Calculate areas of indexed rows and assign them.计算索引行的区域并分配它们。

index <- dat$Year %in% 2020
areas <- apply(dat[index, 2:3], 1, prod)
dat[index, 4] <- areas  
#   Year rep_leaf_length rep_leaf_width rep_leaf_area
# 1 2019          37.400             NA            NA
# 2 2019          21.036          8.080       132.914
# 3 2019          29.147          2.331            NA
# 4 2020          16.600          4.000        66.400
# 5 2020          21.600          2.200        47.520

Data:数据:

dat <- structure(list(Year = c(2019L, 2019L, 2019L, 2020L, 2020L), rep_leaf_length = c(37.4, 
21.036, 29.147, 16.6, 21.6), rep_leaf_width = c(NA, 8.08, 2.331, 
4, 2.2), rep_leaf_area = c(NA, 132.914, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-5L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM