简体   繁体   English

如何以正确的格式从R的聚合函数中获取data.frame?

[英]How do I get a data.frame from R's aggregate function in the right format?

I'm having trouble getting R's aggregate() function to return a data.frame in the format that I'd like. 我无法让R的aggregate()函数以我想要的格式返回data.frame。

Basically I run the aggregation like so: 基本上我像这样运行聚合:

aggregate(df$res, list(full$depth), summary)

where the res column contains TRUE , FALSE and NA . res列包含TRUEFALSENA I want to calculate the number of times each value of res occurs according to the groups in depth , which are six numeric depth values 0, 5, 15, 30, 60 and 100. According to the help page on the aggregate function it coerces the by values to factors, so this oughtn't be a problem (as far as I can tell). 我想根据depth组计算每个res值的次数,这是六个数值深度值0,5,15,30,60和100.根据聚合函数的帮助页面,它强制通过价值因素,所以这不应该是一个问题(据我所知)。

So I run the aggregate function and store it in a data.frame. 所以我运行聚合函数并将其存储在data.frame中。 This is fine; 这可以; it runs without error. 它运行没有错误。 The summary displayed in the R console looks like this: R控制台中显示的摘要如下所示:

  Group.1  x.Mode x.FALSE x.TRUE x.NA's
1       0 logical       3     83      0
2       5 logical       3     83      0
3      15 logical       8     78      0
4      30 logical       5     79      2
5      60 logical       1     64     21
6     100 logical       1     24     61

Again, this is fine, and looks like what I want. 再次,这很好,看起来像我想要的。 But the data.frame containing the results actually has only two columns, and looks like this: 但是包含结果data.frame实际上只有两列,如下所示:

    Group.1 x
1   0   logical
2   5   logical
3   15  logical
4   30  logical
5   60  logical
6   100 logical
7       3
8       3
9       8
10      5
11      1
12      1
13      83
14      83
15      78
16      79
17      64
18      24
19      0
20      0
21      0
22      2
23      21
24      61

I understand from the aggregate() help page that: 我从aggregate()帮助页面了解到:

If the by has names, the non-empty times are used to label the columns in the results, with unnamed grouping variables being named Group.i for by[[i]] . 如果by具有名称,则非空时间用于标记结果中的列,未命名的分组变量by[[i]]命名为Group.i

which suggests to me that if the by has names then the output data.frame would look more like the summary of it that gets printed to the R console (ie it'd have 5 columns including a column of counts for each level in by ) than the two-column version it actually gets saved as. 这表明,我认为,如果by有名字,那么输出data.frame看起来更像是被印刷到R控制台(即,它就会有5列,包括计数的每个级别的列它总结by )它实际上保存为两列版本。 The trouble is that the help page doesn't explain at all what a named by variable is, especially if it's coerced to a list from a data.frame column as in my case. 问题是帮助页面根本没有解释变量命名by内容,特别是如果它像我的情况那样被强制转换为data.frame列中的列表。

What do I need to do differently in order for the data.frame that results from aggregate() to have a column of counts for each level of by as the help suggests it could if I knew what I was doing? 我需要做什么做的不同,以便从产生的data.frame aggregate()有计数的每个级别的列by作为帮助提示它可能如果我知道我在做什么?

This is because the result of aggregate is fairly odd in this case, where the last column is actually a matrix that has four columns, so the result looks like a 5 column data frame, but it's really a 2 column data frame, where the 2nd column is a 4 wide matrix. 这是因为在这种情况下aggregate的结果相当奇怪,其中最后一列实际上是一个有四列的矩阵,因此结果看起来像一个5列数据帧,但它实际上是一个2列数据帧,其中第二列列是4宽矩阵。 Here is a workaround to convert it to a normal data.frame: 以下是将其转换为普通 data.frame的解决方法:

X <- aggregate(sample(c(T, F, NA), 100, r=T), list(rep(letters[1:4], 25)), summary)
X <- cbind(X[-ncol(X)], X[[ncol(X)]])
str(X)
# 'data.frame':  4 obs. of  5 variables:
# $ Group.1: chr  "a" "b" "c" "d"
# $ Mode   : Factor w/ 1 level "logical": 1 1 1 1
# $ FALSE  : Factor w/ 4 levels "10","4","6","8": 3 2 4 1
# $ TRUE   : Factor w/ 2 levels "15","8": 2 1 2 2
# $ NA's   : Factor w/ 4 levels "11","6","7","9": 1 2 4 3

The oddness of the result is a function of summary returning a 4 length vector instead of a single value. 结果的奇怪性是summary返回4长度向量而不是单个值的函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM