简体   繁体   English

如何在R中查看列名并执行操作,然后将其存储在未知行大小的列表中

[英]How to look through column names in R and perform operations then store it in a list of unknown row size

I am a new R programmer and am trying to create a loop through a large amount of columns to weigh data by a certain metric. 我是一名新的R程序员,正在尝试创建一个通过大量列的循环,以按特定度量衡数据。

I have a large data set of variables (some factors, some numerics). 我有大量的变量数据集(某些因素,一些数字)。 I want to loop through my columns, determine which one is a factor, and then if it is a factor I would like to use some tapply functions to do some weighting and return a mean. 我想遍历各列,确定哪一个是一个因素,然后如果它是一个因素,我想使用一些Tapply函数进行一些加权并返回均值。 I have established a function that can do this one at a time here: 我建立了一个可以一次在这里执行此操作的函数:

weight.by.mean <- function(metric,by,x,funct=sum()){

if(is.factor(x)){
a <- tapply(metric, x, funct)
b <- tapply(by, x, funct)
return (a/b)
} 
}

I am passing in the metric that I want to weigh and the by argument is what 
I am weighting the metric BY. x is simply a factor variable that I would 
like to group by.

Example: I have 5 donut types (my argument x ) and I would like to see the mean dough (my argument metric ) used by donut type but I need to weigh the dough used by the amount (argument by ) of dough used for that donut type. 例如:我有5种甜甜圈类型(我的论点x)和我想看到平均面团(我的观点公制 )使用圆环形,但我需要权衡由用于该金额面团( 通过参数)使用的面团甜甜圈类型。

In other words, I am trying to avoid skewing my means by not weighting different donut types more than others (maybe I use a lot of normal dough for glazed donuts but dont use as much special dough for cream filled donuts. I hope this makes sense! 换句话说,我试图通过不给其他甜甜圈类型分配比其他甜甜圈类型更多的权重来避免歪斜(也许我对釉面甜甜圈使用了很多普通面团,但对奶油甜甜圈不使用太多特殊面团。我希望这有意义) !

This is the function I am working on to loop through a large data set with many possible different factor variables such as "donut type" in my prior example. 这是我正在研究的功能,可以遍历具有许多可能的不同因素变量的大型数据集,例如我先前的示例中的“甜甜圈类型”。 It is not yet functional because I am not sure what else to add. 它尚不起作用,因为我不确定还需要添加什么。 Thank you for any assistance you can provide for me. 感谢您为我提供的任何帮助。 I have been using R for less than a month so please keep that in mind. 我使用R不到一个月,因此请记住这一点。

My end goal is to output a matrix or data frame of all these different means but each factor may have anywhere from 5 to 50 different levels so the row size is dependent on the number of levels of each factor. 我的最终目标是输出具有所有这些不同方式的矩阵或数据帧,但是每个因子可能具有5至50个不同的级别,因此行大小取决于每个因子的级别数。

weight.matrix <- function(df,metric,by,funct=sum()){


  n <- ncol(df) ##Number of columns to iterate through
  ColNames <- as.matrix(names(df))
  OutputMatrix <- matrix(1, ,3,nrow=, ncol=3)

 for(i in 1:n){


 if(is.factor(paste("df$",ColNames[i], sep=""))){
  a[[i]] <- tapply(metric, df[,i], funct)
  b[[i]] <- tapply(by, df[,i], funct)
}
OutputMatrix <- (a[[i]]/b[[i]])
}
}

If each of your factors has different levels, then it would make more sense to use a long data frame instead of a wide one. 如果您的每个因素都有不同的级别,那么使用长数据帧而不是宽数据帧会更有意义。 For example: 例如:

Metric      Value        Mean
DonutType   Glazed       3.0
DonutType   Chocolate    5.2
DonutSize   Small        1.2
DonutSize   Medium       2.3
DonutSize   Large        3.6

Data frames are not meant for vectors of different lengths. 数据帧不适合不同长度的向量。 If you want to store your data in a data frame, you need to organize it so all the vector lengths are the same. 如果要将数据存储在数据框中,则需要对其进行组织,以使所有向量长度都相同。 gather() and spread() are functions from the tidyverse package you can use to convert between long and wide data frames. tidyverse gather()spread()tidyverse包中的函数,可用于在长数据帧和宽数据帧之间进行转换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM