在數據幀的所有子集上應用函數

Question

我怎樣才能使物種的Sepal.Length值正常化？

    iris
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
...

# i have to divide by 
tapply(iris$Sepal.Length, iris$Species, max)
    setosa versicolor  virginica 
       5.8        7.0        7.9

換句話說，我想將Species=="setosa"的所有值除以5.8，依此類推，最后我希望在Sepal.Length列中有一個標准化值為0..1的數據框。

最后應該回歸

    iris
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1      0.8793103         3.5          1.4         0.2     setosa
...

Answer 1

顯然，有很多方法可以做到這一點。 我最喜歡ave()的語法（參見DWin的答案）或data.table包的語法：

library(data.table)
dt <- data.table(iris)
dt[, Sepal.Length:=(Sepal.Length)/max(Sepal.Length), by="Species"]
dt
#      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#   1:    0.8793103         3.5          1.4         0.2    setosa
#   2:    0.8448276         3.0          1.4         0.2    setosa
#   3:    0.8103448         3.2          1.3         0.2    setosa
#   4:    0.7931034         3.1          1.5         0.2    setosa
#   5:    0.8620690         3.6          1.4         0.2    setosa
# 146:    0.8481013         3.0          5.2         2.3 virginica
# 147:    0.7974684         2.5          5.0         1.9 virginica
# 149:    0.7848101         3.4          5.4         2.3 virginica
# 150:    0.7468354         3.0          5.1         1.8 virginica

df <- data.frame(dt) ## It's possible (but not necessary) to coerce back to
                     ## a plain old data.frame

Answer 2

我嚴格地解釋了你想要除以最大值的願望。

一種選擇：

aggregate(iris$Sepal.Length,list(iris$Species),FUN = function(x) x/max(x))

而另一個，使用ddply從plyr（和縮放所有列在一次：

ddply(iris,.(Species),colwise(function(x){x / max(x)}))

而且更像@Dwin的ave示例，保持其他列相同，但使用ddply ：

ddply(iris,.(Species),transform,Sepal.Length = Sepal.Length / max(Sepal.Length))

Answer 3

  iris$ratio_to_max <- ave( iris$Sepal.Length, list(iris$Species), 
                                                     FUN= function(x) x/max(x))
#-------------
> str(iris)
'data.frame':   150 obs. of  6 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ratio_to_max: num  0.879 0.845 0.81 0.793 0.862 ...

如果你想替換Sepal.Length專欄你可以這樣做，但我通常會避免這種破壞性練習，直到我確信我得到了我想要的東西。 （即便如此，我也感到內疚。）如果您希望將其放在單獨的列表“數據包”中並丟棄原始的“Sepal.Length”列，則可以使用split ：

 spl.iris <- split(iris[-1], iris$Species)
 str(spl.iris)

Answer 4

我確信有更好的plyr或數據表甚至基本方式：

L1 <- lapply(split(iris[, -5], iris$Species), function(x) apply(x, 2, scale))
L2 <- lapply(seq_along(L1), function(i) {
    data.frame(SPecies=names(L1)[i], L1[[i]])
})
do.call(rbind, L2)

在數據幀的所有子集上應用函數

問題描述

4 個解決方案

解決方案1
7 2012-10-25 20:19:01

解決方案2
5 已采納 2012-10-25 20:20:20

解決方案3
3 2012-10-25 20:27:08

解決方案4
0 2012-10-25 20:15:26

在數據幀的所有子集上應用函數

問題描述

4 個解決方案

解決方案1 7 2012-10-25 20:19:01

解決方案2 5 已采納 2012-10-25 20:20:20

解決方案3 3 2012-10-25 20:27:08

解決方案4 0 2012-10-25 20:15:26

解決方案1
7 2012-10-25 20:19:01

解決方案2
5 已采納 2012-10-25 20:20:20

解決方案3
3 2012-10-25 20:27:08

解決方案4
0 2012-10-25 20:15:26