简体   繁体   中英

R - Add a calculated column to a summarized dataframe based on raw data and column from summarized df

I have a dataframe that contains some raw data. Lets take an example and use the data sample "iris".

# load a data sample
data("iris")

#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
# ...

I have an other dataframe which contains summarized data on the species.

species <- data.frame(unique(iris$Species))
colnames(species) <- "s"

# Add a zoom level
species$zoom <- c(2,3,5)

#                species  zoom
# 1               setosa     2
# 2           versicolor     3
# 3            virginica     5

I would like to add to this summarized dataframe (called species in this example) a calculated column.

I tried both

species$mean <- species$zoom * mean(iris$Sepal.Length)
# (AND)
species$mean <- species$zoom * mean(iris$Sepal.Length[iris$Species==species$s])

but the first one isn't working because it is doing the calculation on all raw data, it doesn't group by species. The second one doesn't appear to work too.

Could I do this without looping on rows?

Perhaps this data.table approach van help you out?

data("iris")

library(data.table)
setDT( iris )[ , list( mean = mean( Sepal.Length ) ), by=Species][, mean_mult := mean * c(2,3,5)][]

#       Species  mean mean_mult
# 1:     setosa 5.006    10.012
# 2: versicolor 5.936    17.808
# 3:  virginica 6.588    32.940

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM