用户定义的功能基于按类别分组的多个列

Question

这可能是基本的，但是我已经尝试了好几天了，却没有找到答案。

我正在尝试根据“浓度”和“面积”两列（按“集水量”分组）来计算新数量。 我编写了一个函数来计算每行和最大面积的行的浓度差，该行通过该流域中的面积比例归一化，但是它不适用于dplyr或aggregate （。它可以与by很好地工作，但是然后返回一个列表。

理想情况下，我想在数据框上添加一列或完全替换浓度列。 这是数据框“ lev”：

  area catchment concentration
1    1       Yup       2.00000
2   10       Yup      40.50000
3   25       Yup      50.82031
4   35       Yup      50.00000
5    1      Nope       1.00000
6   10      Nope       5.00000
7   25      Nope      40.08333
8   35      Nope      38.00000

这是函数：

lever <- function(data=lev, x=data[,"concentration"], y=data[,"area"]){
N= which.max(y) 
L = (x - x[N]) * y/max(y)
return(L)}

这是预期的结果：

   area catchment concentration   leverage
1    1       Yup       2.00000 -1.3714286
2   10       Yup      40.50000 -2.7142857
3   25       Yup      50.82031  0.5859375
4   35       Yup      50.00000  0.0000000
5    1      Nope       1.00000 -1.0571429
6   10      Nope       5.00000 -9.4285714
7   25      Nope      40.08333  1.4880952
8   35      Nope      38.00000  0.0000000

使用by ，我可以获得两个列表，每个流域的结果：

by(lev, lev$catchment, lever)

但是我想在按几个因素分类的多列上使用该函数（例如，除了汇水日期外），我得到

“尺寸错误”

doBy和dplyr错误。

Answer 1

我们可以使用tidyverse

library(tidyverse)
df1 %>% 
  group_by(catchment) %>%
  mutate(leverage = (concentration- concentration[which.max(area)]) * area/max(area))

根据说明，如果有多个列作为分组变量，请将其放在group_by ，并且该计算也可以应用于带有mutate_each多个列

Answer 2

加载数据：

lev <- read.table(text = "area catchment concentration
    1       Yup       2.00000
   10       Yup      40.50000
   25       Yup      50.82031
   35       Yup      50.00000
    1      Nope       1.00000
   10      Nope       5.00000
   25      Nope      40.08333
   35      Nope      38.00000", 
   header=TRUE)

按集水区分组

library(dplyr)
lev %>% 
    group_by(catchment) %>% 
    mutate(N = which.max(area),
           L = (concentration - concentration[N]) * area/max(area))

# 
#    area catchment concentration     N          L
#   <int>    <fctr>         <dbl> <int>      <dbl>
# 1     1       Yup       2.00000     4 -1.3714286
# 2    10       Yup      40.50000     4 -2.7142857
# 3    25       Yup      50.82031     4  0.5859357
# 4    35       Yup      50.00000     4  0.0000000
# 5     1      Nope       1.00000     4 -1.0571429
# 6    10      Nope       5.00000     4 -9.4285714
# 7    25      Nope      40.08333     4  1.4880929
# 8    35      Nope      38.00000     4  0.0000000

使用你的功能

我修改您的函数，以便它返回一个数据帧。

lever2 <- function(data, 
                   x = data[,"concentration"][[1]], 
                   y = data[,"area"][[1]]){
    # Use [[1]] to extract the vector only
    N <- which.max(y)
    L <- (x - x[N]) * y/max(y)
    # Put L back into the data frame 
    # so that we keep the concentration and area in the result
    data$L <- L
    return(data)
    }

然后可以将该功能与dplyr::group_by %>% do

lev %>% 
    group_by(catchment) %>% 
    do( lever2(.))

Answer 3

您还可以使用data.table来计算该值：

library(data.table)
# convert to data.table
setDT(df)

df[, leverage := (concentration - concentration[which.max(area)]) * (area / max(area)),
   by=catchment]
df
   area catchment concentration   leverage
1:    1       Yup       2.00000 -1.3714286
2:   10       Yup      40.50000 -2.7142857
3:   25       Yup      50.82031  0.5859357
4:   35       Yup      50.00000  0.0000000
5:    1      Nope       1.00000 -1.0571429
6:   10      Nope       5.00000 -9.4285714
7:   25      Nope      40.08333  1.4880929
8:   35      Nope      38.00000  0.0000000

数据

df <-
structure(list(area = c(1L, 10L, 25L, 35L, 1L, 10L, 25L, 35L), 
    catchment = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Nope", 
    "Yup"), class = "factor"), concentration = c(2, 40.5, 50.82031, 
    50, 1, 5, 40.08333, 38)), .Names = c("area", "catchment", 
"concentration"), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8"))

用户定义的功能基于按类别分组的多个列

问题描述

3 个解决方案

解决方案1
1 2017-02-22 15:32:03

解决方案2
1 已采纳 2017-02-22 15:36:50

使用你的功能

解决方案3
1 2017-02-22 15:39:54

用户定义的功能基于按类别分组的多个列

问题描述

3 个解决方案

解决方案1 1 2017-02-22 15:32:03

解决方案2 1 已采纳 2017-02-22 15:36:50

使用你的功能

解决方案3 1 2017-02-22 15:39:54

解决方案1
1 2017-02-22 15:32:03

解决方案2
1 已采纳 2017-02-22 15:36:50

解决方案3
1 2017-02-22 15:39:54