简体   繁体   English

使用聚合命名新列的问题

[英]Issue with naming new column using aggregate

For some reason, aggregate is giving me the wrong column names, even though the data are still coming out correct. 由于某种原因,即使数据仍然正确, aggregate仍会给我错误的列名。 Can anyone tell me why (am I doing something wrong)? 谁能告诉我为什么(我做错了吗)?

For example, with a dataframe df : 例如,使用数据框df

df <- structure(list(Site = c(1L, 1L, 1L, 2L, 2L, 2L), Sample = c(1L, 
2L, 3L, 1L, 2L, 3L), Diameter = 1:6), .Names = c("Site", "Sample", 
"Diameter"), class = "data.frame", row.names = c(NA, -6L))

which looks like 看起来像

    Site Sample Diameter
1    1      1        1
2    1      2        2
3    1      3        3
4    2      1        4
5    2      2        5
6    2      3        6

I run the following code 我运行以下代码

# Add column to calculate area from diameter
df['Area'] = ((df['Diameter']/2)^2)*pi

# Subset sites
Site1 <- subset(df, Site == "1")

# Calculate total area for each site
Site1_area <- aggregate(Site1$Area, by=list(Sample=Site1$Sample), sum, na.rm=TRUE)

Site1_area

This gives the new dataframe Site1_area as 这将新数据Site1_area

    Sample  Diameter
1      1 0.7853982
2      2 3.1415927
3      3 7.0685835

where the calculated areas have been preserved, but the column name is now incorrectly given as Diameter instead of Area . 保留了计算区域的位置,但是现在错误地将列名称指定为Diameter而不是Area I know I can rename this using 我知道我可以使用

colnames(Site1_area) <- c("Sample", "Area")

but it seems odd to me that the column isn't being named correctly to begin with. 但是对于我来说,列开头的名称不正确似乎很奇怪。 Can anyone tell me why? 谁能告诉我为什么? Am I doing something incorrectly? 我做错了什么吗?

Many thanks! 非常感谢!

You made an error that wasn't caught when you did this: 您执行此操作时未发现错误:

df['Area'] = ((df['Diameter']/2)^2)*pi

Should have been: 本来应该:

df[['Area']] = ((df[['Diameter']]/2)^2)*pi

After you did this you had: 完成此操作后,您将拥有:

> df
  Site Sample Diameter   Diameter
1    1      1        1  0.7853982
2    1      2        2  3.1415927
3    1      3        3  7.0685835
4    2      1        4 12.5663706
5    2      2        5 19.6349541
6    2      3        6 28.2743339

So you never really had a column named "Area". 因此,您永远不会真正拥有一个名为“区域”的列。 If you want the labeling to be simple then try the aggregate.formula method: 如果您希望标签简单,请尝试使用aggregate.formula方法:

Site1_area2 <- aggregate(Area~Sample, data=Site1, sum, na.rm=TRUE)
> Site1_area2
  Sample      Area
1      1 0.7853982
2      2 3.1415927
3      3 7.0685835

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM