[英]User defined function based on multiple columns grouped by category
這可能是基本的,但是我已經嘗試了好幾天了,卻沒有找到答案。
我正在嘗試根據“濃度”和“面積”兩列(按“集水量”分組)來計算新數量。 我編寫了一個函數來計算每行和最大面積的行的濃度差,該行通過該流域中的面積比例歸一化,但是它不適用於dplyr
或aggregate
(。它可以與by很好地工作,但是然后返回一個列表。
理想情況下,我想在數據框上添加一列或完全替換濃度列。 這是數據框“ lev”:
area catchment concentration
1 1 Yup 2.00000
2 10 Yup 40.50000
3 25 Yup 50.82031
4 35 Yup 50.00000
5 1 Nope 1.00000
6 10 Nope 5.00000
7 25 Nope 40.08333
8 35 Nope 38.00000
這是函數:
lever <- function(data=lev, x=data[,"concentration"], y=data[,"area"]){
N= which.max(y)
L = (x - x[N]) * y/max(y)
return(L)}
這是預期的結果:
area catchment concentration leverage
1 1 Yup 2.00000 -1.3714286
2 10 Yup 40.50000 -2.7142857
3 25 Yup 50.82031 0.5859375
4 35 Yup 50.00000 0.0000000
5 1 Nope 1.00000 -1.0571429
6 10 Nope 5.00000 -9.4285714
7 25 Nope 40.08333 1.4880952
8 35 Nope 38.00000 0.0000000
使用by
,我可以獲得兩個列表,每個流域的結果:
by(lev, lev$catchment, lever)
但是我想在按幾個因素分類的多列上使用該函數(例如,除了匯水日期外),我得到
“尺寸錯誤”
doBy
和dplyr
錯誤。
我們可以使用tidyverse
library(tidyverse)
df1 %>%
group_by(catchment) %>%
mutate(leverage = (concentration- concentration[which.max(area)]) * area/max(area))
根據說明,如果有多個列作為分組變量,請將其放在group_by
,並且該計算也可以應用於帶有mutate_each
多個列
加載數據:
lev <- read.table(text = "area catchment concentration
1 Yup 2.00000
10 Yup 40.50000
25 Yup 50.82031
35 Yup 50.00000
1 Nope 1.00000
10 Nope 5.00000
25 Nope 40.08333
35 Nope 38.00000",
header=TRUE)
按集水區分組
library(dplyr)
lev %>%
group_by(catchment) %>%
mutate(N = which.max(area),
L = (concentration - concentration[N]) * area/max(area))
#
# area catchment concentration N L
# <int> <fctr> <dbl> <int> <dbl>
# 1 1 Yup 2.00000 4 -1.3714286
# 2 10 Yup 40.50000 4 -2.7142857
# 3 25 Yup 50.82031 4 0.5859357
# 4 35 Yup 50.00000 4 0.0000000
# 5 1 Nope 1.00000 4 -1.0571429
# 6 10 Nope 5.00000 4 -9.4285714
# 7 25 Nope 40.08333 4 1.4880929
# 8 35 Nope 38.00000 4 0.0000000
我修改您的函數,以便它返回一個數據幀。
lever2 <- function(data,
x = data[,"concentration"][[1]],
y = data[,"area"][[1]]){
# Use [[1]] to extract the vector only
N <- which.max(y)
L <- (x - x[N]) * y/max(y)
# Put L back into the data frame
# so that we keep the concentration and area in the result
data$L <- L
return(data)
}
然后可以將該功能與dplyr::group_by %>% do
lev %>%
group_by(catchment) %>%
do( lever2(.))
您還可以使用data.table
來計算該值:
library(data.table)
# convert to data.table
setDT(df)
df[, leverage := (concentration - concentration[which.max(area)]) * (area / max(area)),
by=catchment]
df
area catchment concentration leverage
1: 1 Yup 2.00000 -1.3714286
2: 10 Yup 40.50000 -2.7142857
3: 25 Yup 50.82031 0.5859357
4: 35 Yup 50.00000 0.0000000
5: 1 Nope 1.00000 -1.0571429
6: 10 Nope 5.00000 -9.4285714
7: 25 Nope 40.08333 1.4880929
8: 35 Nope 38.00000 0.0000000
數據
df <-
structure(list(area = c(1L, 10L, 25L, 35L, 1L, 10L, 25L, 35L),
catchment = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Nope",
"Yup"), class = "factor"), concentration = c(2, 40.5, 50.82031,
50, 1, 5, 40.08333, 38)), .Names = c("area", "catchment",
"concentration"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.