R函数中的使用因数

Question

Im having some troubles using factors in functions, or just to make use of them in basic calculations. 我在函数中使用因子或仅在基本计算中使用它们时遇到一些麻烦。 I have a data-frame something like this (but with as many as 6000 different factors). 我有一个类似这样的数据框（但是有多达6000个不同的因素）。

df<- data.frame( p <- runif(20)*100,
q = sample(1:100,20, replace = T),
tt = c("e","e","f","f","f","i","h","e","i","i","f","f","j","j","h","h","h","e","j","i"),
ta = c("a","a","a","b","b","b","a","a","c","c","a","b","a","a","c","c","b","a","c","b"))
colnames(df)<-c("p","q","ta","tt")

Now price = p and quantity = q are my variables, and tt and ta are different factors. 现在价格= p和数量= q是我的变量，而tt和ta是不同的因素。

Now, I would first like to find the average price per unit of q by each different factor in tt 现在，我首先想通过tt中每个不同的因素来找到每单位q的平均价格

(p*q ) / sum(q) by tt

This would in this case give me a list of 3 different sums, by a, b and c (I have 6000 different factors so I need to do it smart :) ). 在这种情况下，这将通过a，b和c给出3个不同总和的列表（我有6000个不同的因数，所以我需要做得很聪明:)）。

I have tried using split to make lists, and in this case i can get each individual tt factor to contain the prices and another for the quantity, but I cant seem to get them to for example make an average. 我已经尝试使用split来创建列表，在这种情况下，我可以让每个tt因素包含价格，而另一个可以包含数量，但是我似乎无法例如使它们成为平均值。 I've also tried to use tapply, but again I can't see how I can incorporate factors into this? 我也尝试过使用tapply，但是再次看不到如何将因素纳入其中？

EDIT: I can see I need to clearify: 编辑：我可以看到我需要澄清：

I need to find 3 sums, the average price pr. 我需要找到3个总和，平均价格pr。 q given each factor, so in this simplified case it would be: 给定每个因子，因此在这种简化情况下为：

a: Sum of p*q for (Row (1,2,3, 7, 11, 13,14,18) / sum (q for row Row (1,2,3, 7, 11, 13,14,18) a：（行（1,2,3,7,11,13,14,18）的p * q的总和/（行Row（1,2,3,7,11,13,14,18的q的q））

So the result should be the average price for a, b and c, which is just 3 values. 因此，结果应该是a，b和c的平均价格，仅为3个值。

Answer 1

I'd use plyr to do this: 我会用plyr做到这一点：

library(plyr)
ddply(df, .(tt), mutate, new_col = (p*q) / sum(q))
          p  q ta tt     new_col
1  73.92499 70  e  a 11.29857879
2  58.49011 60  e  a  7.66245932
3  17.23246 27  f  a  1.01588711
4  64.74637 42  h  a  5.93743967
5  55.89372 45  e  a  5.49174103
6  25.87318 83  f  a  4.68880732
7  12.35469 23  j  a  0.62043207
8   1.19060 83  j  a  0.21576367
9  84.18467 25  e  a  4.59523322
10 73.59459 66  f  b 10.07726727
11 26.12099 99  f  b  5.36509998
12 25.63809 80  i  b  4.25528535
13 54.74334 90  f  b 10.22178577
14 69.45430 50  h  b  7.20480246
15 52.71006 97  i  b 10.60762667
16 17.78591 54  i  c  5.16365066
17  0.15036 41  i  c  0.03314388
18 85.57796 30  h  c 13.80289670
19 54.38938 44  h  c 12.86630433
20 44.50439 17  j  c  4.06760541

plyr does have a reputation for being slow, data.table provides similar functionality, but much higher performance. plyr确实以速度慢而data.table ， data.table提供了类似的功能，但是性能更高。

Answer 2

If I understood corectly you'r problem this should be the answer. 如果我完全理解您的问题，那应该是答案。 Give it a try and responde, that I can adjust it if it's needed. 尝试一下并做出回应，如有需要，我可以进行调整。

myRes <- function(tt) {

  out <- NULL;
  qsum <- sum(as.numeric(df[,"q"]))
  psum <- sum(as.numeric(df[,"p"]))
  for (var in tt) {
    index <- which(df["tt"] == var)

    out <- c(out, ((qsum *psum) / sum(df[index,"q"])))
  }
  return (out)
}

threeValue <- myRes(levels(df[, "tt"]));

R函数中的使用因数

问题描述

2 个解决方案

解决方案1
1 2013-09-16 10:10:20

解决方案2
0 已采纳 2013-09-16 10:11:12

R函数中的使用因数

问题描述

2 个解决方案

解决方案1 1 2013-09-16 10:10:20

解决方案2 0 已采纳 2013-09-16 10:11:12

解决方案1
1 2013-09-16 10:10:20

解决方案2
0 已采纳 2013-09-16 10:11:12