简体   繁体   English

R-根据原始数据和汇总df中的列将计算列添加到汇总数据框中

[英]R - Add a calculated column to a summarized dataframe based on raw data and column from summarized df

I have a dataframe that contains some raw data. 我有一个包含一些原始数据的数据框。 Lets take an example and use the data sample "iris". 让我们举一个例子,并使用数据样本“ iris”。

# load a data sample
data("iris")

#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
# ...

I have an other dataframe which contains summarized data on the species. 我还有另一个数据框,其中包含有关该物种的汇总数据。

species <- data.frame(unique(iris$Species))
colnames(species) <- "s"

# Add a zoom level
species$zoom <- c(2,3,5)

#                species  zoom
# 1               setosa     2
# 2           versicolor     3
# 3            virginica     5

I would like to add to this summarized dataframe (called species in this example) a calculated column. 我想在此汇总数据框(在此示例中称为“ species )中添加一个计算列。

I tried both 我都尝试过

species$mean <- species$zoom * mean(iris$Sepal.Length)
# (AND)
species$mean <- species$zoom * mean(iris$Sepal.Length[iris$Species==species$s])

but the first one isn't working because it is doing the calculation on all raw data, it doesn't group by species. 但是第一个不起作用,因为它正在对所有原始数据进行计算,没有按物种分组。 The second one doesn't appear to work too. 第二个似乎也不起作用。

Could I do this without looping on rows? 我可以在不循环行的情况下执行此操作吗?

Perhaps this data.table approach van help you out? 也许这种data.table方法可以帮助您吗?

data("iris")

library(data.table)
setDT( iris )[ , list( mean = mean( Sepal.Length ) ), by=Species][, mean_mult := mean * c(2,3,5)][]

#       Species  mean mean_mult
# 1:     setosa 5.006    10.012
# 2: versicolor 5.936    17.808
# 3:  virginica 6.588    32.940

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM