[英]User defined function based on multiple columns grouped by category
这可能是基本的,但是我已经尝试了好几天了,却没有找到答案。
我正在尝试根据“浓度”和“面积”两列(按“集水量”分组)来计算新数量。 我编写了一个函数来计算每行和最大面积的行的浓度差,该行通过该流域中的面积比例归一化,但是它不适用于dplyr
或aggregate
(。它可以与by很好地工作,但是然后返回一个列表。
理想情况下,我想在数据框上添加一列或完全替换浓度列。 这是数据框“ lev”:
area catchment concentration
1 1 Yup 2.00000
2 10 Yup 40.50000
3 25 Yup 50.82031
4 35 Yup 50.00000
5 1 Nope 1.00000
6 10 Nope 5.00000
7 25 Nope 40.08333
8 35 Nope 38.00000
这是函数:
lever <- function(data=lev, x=data[,"concentration"], y=data[,"area"]){
N= which.max(y)
L = (x - x[N]) * y/max(y)
return(L)}
这是预期的结果:
area catchment concentration leverage
1 1 Yup 2.00000 -1.3714286
2 10 Yup 40.50000 -2.7142857
3 25 Yup 50.82031 0.5859375
4 35 Yup 50.00000 0.0000000
5 1 Nope 1.00000 -1.0571429
6 10 Nope 5.00000 -9.4285714
7 25 Nope 40.08333 1.4880952
8 35 Nope 38.00000 0.0000000
使用by
,我可以获得两个列表,每个流域的结果:
by(lev, lev$catchment, lever)
但是我想在按几个因素分类的多列上使用该函数(例如,除了汇水日期外),我得到
“尺寸错误”
doBy
和dplyr
错误。
我们可以使用tidyverse
library(tidyverse)
df1 %>%
group_by(catchment) %>%
mutate(leverage = (concentration- concentration[which.max(area)]) * area/max(area))
根据说明,如果有多个列作为分组变量,请将其放在group_by
,并且该计算也可以应用于带有mutate_each
多个列
加载数据:
lev <- read.table(text = "area catchment concentration
1 Yup 2.00000
10 Yup 40.50000
25 Yup 50.82031
35 Yup 50.00000
1 Nope 1.00000
10 Nope 5.00000
25 Nope 40.08333
35 Nope 38.00000",
header=TRUE)
按集水区分组
library(dplyr)
lev %>%
group_by(catchment) %>%
mutate(N = which.max(area),
L = (concentration - concentration[N]) * area/max(area))
#
# area catchment concentration N L
# <int> <fctr> <dbl> <int> <dbl>
# 1 1 Yup 2.00000 4 -1.3714286
# 2 10 Yup 40.50000 4 -2.7142857
# 3 25 Yup 50.82031 4 0.5859357
# 4 35 Yup 50.00000 4 0.0000000
# 5 1 Nope 1.00000 4 -1.0571429
# 6 10 Nope 5.00000 4 -9.4285714
# 7 25 Nope 40.08333 4 1.4880929
# 8 35 Nope 38.00000 4 0.0000000
我修改您的函数,以便它返回一个数据帧。
lever2 <- function(data,
x = data[,"concentration"][[1]],
y = data[,"area"][[1]]){
# Use [[1]] to extract the vector only
N <- which.max(y)
L <- (x - x[N]) * y/max(y)
# Put L back into the data frame
# so that we keep the concentration and area in the result
data$L <- L
return(data)
}
然后可以将该功能与dplyr::group_by %>% do
lev %>%
group_by(catchment) %>%
do( lever2(.))
您还可以使用data.table
来计算该值:
library(data.table)
# convert to data.table
setDT(df)
df[, leverage := (concentration - concentration[which.max(area)]) * (area / max(area)),
by=catchment]
df
area catchment concentration leverage
1: 1 Yup 2.00000 -1.3714286
2: 10 Yup 40.50000 -2.7142857
3: 25 Yup 50.82031 0.5859357
4: 35 Yup 50.00000 0.0000000
5: 1 Nope 1.00000 -1.0571429
6: 10 Nope 5.00000 -9.4285714
7: 25 Nope 40.08333 1.4880929
8: 35 Nope 38.00000 0.0000000
数据
df <-
structure(list(area = c(1L, 10L, 25L, 35L, 1L, 10L, 25L, 35L),
catchment = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Nope",
"Yup"), class = "factor"), concentration = c(2, 40.5, 50.82031,
50, 1, 5, 40.08333, 38)), .Names = c("area", "catchment",
"concentration"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.