R：在用户定义的函数中使用get和data.table

Question

我正在学习如何在R中编写使用诸如data.table和dplyr之类的通用包的函数。

我编写的此函数用于计算其他类别中特定类别中观察值的百分比（例如：2015年发布的10至20mpg汽车的份额）并生成表格。 这里没有周围的功能：

library(data.table)
library(scales)


#Create test dataframe and cut off points
test<-data.frame(x=c(0:10), y=c(rep(1,5),rep(2,6)), z=c("A","A","A","B","B","B","C","C","C","C","C"))
test <- data.table(test)


#trial non function version (calculating share of row by category z): works
tmp<-test[,.(N=.N), keyby=.(y,z)]
tmp[,total:=sum(N), by=y]
tmp[,percent:=percent(N/total)]
dcast(tmp,y ~ z, value.var="percent")

但是为了使其在一个函数中起作用，我不得不使用get。 一旦对get求值，其余的代码必须将这两个类别变量称为“ get”和“ get.1”（请参见下文）。 有办法避免这种情况吗？

#Two way table function: data.table

tw_tab<-function(dt,v1,v2){

#set up variables as charaters
v1<-as.character(substitute(v1))
v2<-as.character(substitute(v2))
dt<-as.character(substitute(dt))

#function
tmp<-get(dt)[,.(N=.N), keyby=.(get(v1),get(v2))]
tmp[,total:=sum(N), by=get]
tmp[,percent:=percent(N/total)]
dcast(tmp,get ~ get.1, value.var="percent")

}

#test function
tw_tab(test, y, z)

我尝试在整个代码中仅使用“ get（v1）”和“ get（v2）”，但这不起作用

我已经看过有关data.table用户函数的其他文章（例如，在data.table中获得用户定义的函数），但是他们似乎并没有解决这个问题。

我是新来的，因此，如果您有其他关于这样做的更好方法的反馈/意见，请多加赞赏。

Answer 1

您不必在dt上调用get （根据我的经验， get通常用于使用字符串来引用列），并且可以向by或keyby提供字符向量：

tw_tab <- function(dt,v1,v2){

    #set up variables as charaters
    v1<-as.character(substitute(v1))
    v2<-as.character(substitute(v2))

    #function
    tmp <- dt[,.(N=.N), keyby = c(v1, v2)]
    tmp[,total:=sum(N), by= c(v1)]
    tmp[,percent:=percent(N/total)]
    dcast(tmp, paste(v1, '~', v2), value.var="percent")
}

#test function
tw_tab(test, y, z)
#    y     A     B     C
# 1: 1 60.0% 40.0%    NA
# 2: 2    NA 16.7% 83.3%

这也是使用xtabs和prop.table的解决方案：

tw_tab <- function(x, v1, v2){
    fm <- bquote(~ .(substitute(v1)) + .(substitute(v2)))
    res <- prop.table(xtabs(formula = fm, data = x), 1)
    res <- as.data.frame.matrix(res)
    res[] <- lapply(res, scales::percent)
    return(res)
}

tw_tab(test, y, z)
#     A     B     C
# 1 60% 40.0%  0.0%
# 2  0% 16.7% 83.3%

Answer 2

我会做...

row_pct = function(DT, fm){
  all = all.vars(fm)
  lhs = all.vars(fm[[2]])
  rhs = all.vars(fm[[3]])

  DT[, .N, by=all][, 
    p := percent(N/sum(N)), by=lhs][, 
    dcast(.SD, eval(fm), value.var = "p", fill = percent(0))]
}

例子：

row_pct(test, y ~ z)

   y   A     B     C
1: 1 60%   40%    0%
2: 2  0% 16.7% 83.3%

row_pct(data.table(mtcars), cyl + gear ~ carb)

   cyl gear    1     2     3     4    6   8
1:   4    3 100%    0%    0%    0%   0%  0%
2:   4    4  50%   50%    0%    0%   0%  0%
3:   4    5   0%  100%    0%    0%   0%  0%
4:   6    3 100%    0%    0%    0%   0%  0%
5:   6    4   0%    0%    0%  100%   0%  0%
6:   6    5   0%    0%    0%    0% 100%  0%
7:   8    3   0% 33.3% 25.0% 41.7%   0%  0%
8:   8    5   0%    0%    0%   50%   0% 50%

如果出于某些原因您想分别输入row和col vars：

row_pct2 = function(DT, rowvars, colvar){
      fm  = substitute(`~`(rowvars, colvar))
      row_pct(DT, fm)
}

# Examples:
row_pct2(test, y, z)
row_pct2(data.table(mtcars), cyl + gear, carb)

R：在用户定义的函数中使用get和data.table

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-08-31 13:46:10

解决方案2
1 2017-08-31 14:56:13

R：在用户定义的函数中使用get和data.table

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-08-31 13:46:10

解决方案2 1 2017-08-31 14:56:13

解决方案1
2 已采纳 2017-08-31 13:46:10

解决方案2
1 2017-08-31 14:56:13