简体   繁体   English

用户定义的功能基于按类别分组的多个列

[英]User defined function based on multiple columns grouped by category

This may be basic, but I've been trying to figure it out for days and haven't found an answer. 这可能是基本的,但是我已经尝试了好几天了,却没有找到答案。

I am trying to calculate a new quantity based on two columns 'concentration' and 'area' grouped by 'catchment'. 我正在尝试根据“浓度”和“面积”两列(按“集水量”分组)来计算新数量。 I've written a function to calculate the difference in concentration for each row and the row with the largest area normalized by proportion of area in that catchment, but it won't work with dplyr or aggregate (. It works fine with by, but then returns a list. 我编写了一个函数来计算每行和最大面积的行的浓度差,该行通过该流域中的面积比例归一化,但是它不适用于dplyraggregate (。它可以与by很好地工作,但是然后返回一个列表。

Ideally, I want to add a column onto the dataframe or replace the concentration column altogether. 理想情况下,我想在数据框上添加一列或完全替换浓度列。 Here is the dataframe 'lev': 这是数据框“ lev”:

  area catchment concentration
1    1       Yup       2.00000
2   10       Yup      40.50000
3   25       Yup      50.82031
4   35       Yup      50.00000
5    1      Nope       1.00000
6   10      Nope       5.00000
7   25      Nope      40.08333
8   35      Nope      38.00000

Here is the function: 这是函数:

lever <- function(data=lev, x=data[,"concentration"], y=data[,"area"]){
N= which.max(y) 
L = (x - x[N]) * y/max(y)
return(L)}

And here is the desired result: 这是预期的结果:

   area catchment concentration   leverage
1    1       Yup       2.00000 -1.3714286
2   10       Yup      40.50000 -2.7142857
3   25       Yup      50.82031  0.5859375
4   35       Yup      50.00000  0.0000000
5    1      Nope       1.00000 -1.0571429
6   10      Nope       5.00000 -9.4285714
7   25      Nope      40.08333  1.4880952
8   35      Nope      38.00000  0.0000000 

Using by , I can get two lists with the results for each catchment: 使用by ,我可以获得两个列表,每个流域的结果:

by(lev, lev$catchment, lever)

but I want to use the function on multiple columns categorized by several factors (eg date in addition to catchment) and I get 但是我想在按几个因素分类的多列上使用该函数(例如,除了汇水日期外),我得到

'incorrect number of dimensions' “尺寸错误”

errors with doBy and dplyr . doBydplyr错误。

We can use tidyverse 我们可以使用tidyverse

library(tidyverse)
df1 %>% 
  group_by(catchment) %>%
  mutate(leverage = (concentration- concentration[which.max(area)]) * area/max(area))

Based on the description, if there are multiple columns as grouping variable, place those in the group_by , and the calculation can also be applied to multiple columns with mutate_each 根据说明,如果有多个列作为分组变量,请将其放在group_by ,并且该计算也可以应用于带有mutate_each多个列

Loading your data: 加载数据:

lev <- read.table(text = "area catchment concentration
    1       Yup       2.00000
   10       Yup      40.50000
   25       Yup      50.82031
   35       Yup      50.00000
    1      Nope       1.00000
   10      Nope       5.00000
   25      Nope      40.08333
   35      Nope      38.00000", 
   header=TRUE)

Grouped by catchment 按集水区分组

library(dplyr)
lev %>% 
    group_by(catchment) %>% 
    mutate(N = which.max(area),
           L = (concentration - concentration[N]) * area/max(area))

# 
#    area catchment concentration     N          L
#   <int>    <fctr>         <dbl> <int>      <dbl>
# 1     1       Yup       2.00000     4 -1.3714286
# 2    10       Yup      40.50000     4 -2.7142857
# 3    25       Yup      50.82031     4  0.5859357
# 4    35       Yup      50.00000     4  0.0000000
# 5     1      Nope       1.00000     4 -1.0571429
# 6    10      Nope       5.00000     4 -9.4285714
# 7    25      Nope      40.08333     4  1.4880929
# 8    35      Nope      38.00000     4  0.0000000

Using your function 使用你的功能

I modify your function so that it returns a data frame. 我修改您的函数,以便它返回一个数据帧。

lever2 <- function(data, 
                   x = data[,"concentration"][[1]], 
                   y = data[,"area"][[1]]){
    # Use [[1]] to extract the vector only
    N <- which.max(y)
    L <- (x - x[N]) * y/max(y)
    # Put L back into the data frame 
    # so that we keep the concentration and area in the result
    data$L <- L
    return(data)
    }

The funtion can then be used with dplyr::group_by %>% do 然后可以将该功能与dplyr::group_by %>% do

lev %>% 
    group_by(catchment) %>% 
    do( lever2(.))

You can also use data.table to calculate this value: 您还可以使用data.table来计算该值:

library(data.table)
# convert to data.table
setDT(df)

df[, leverage := (concentration - concentration[which.max(area)]) * (area / max(area)),
   by=catchment]
df
   area catchment concentration   leverage
1:    1       Yup       2.00000 -1.3714286
2:   10       Yup      40.50000 -2.7142857
3:   25       Yup      50.82031  0.5859357
4:   35       Yup      50.00000  0.0000000
5:    1      Nope       1.00000 -1.0571429
6:   10      Nope       5.00000 -9.4285714
7:   25      Nope      40.08333  1.4880929
8:   35      Nope      38.00000  0.0000000

data 数据

df <-
structure(list(area = c(1L, 10L, 25L, 35L, 1L, 10L, 25L, 35L), 
    catchment = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Nope", 
    "Yup"), class = "factor"), concentration = c(2, 40.5, 50.82031, 
    50, 1, 5, 40.08333, 38)), .Names = c("area", "catchment", 
"concentration"), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM