简体   繁体   中英

How to properly use do.call within a function?

I'm progressively transitioning from SAS to R, and at the moment I am trying to replicate what I used to do with macros.

I have a table that contains all my data (let's call it IDF_pop) and from this table I create two other : YVE_pop and EPCI_pop, which are two subsets from the main table. I prefer creating separate tables, but I guess this might not be optimal. Here's how I proceed :

## Let's say the main table contains 10 lines.
## codgeo is the city's postal code, epci is the area, and I have three
## variables that describe different parts of the population

codgeo <- c("75014","75020","78300","78520","78650","91200","91600","92500","93100","95230")
epci <- c("001","001","002","002","003","004","004","005","006","007")
pop0_15 <- c(10000*runif(10))
pop15_64 <- c(10000*runif(10))
pop65p <- c(10000*runif(10))

IDF_pop <- data.frame(codgeo,epci,pop0_15,pop15_64,pop65p)

## I'd like my population to be in one single column, for this I'll use melt

IDF_pop_line <- melt(IDF_pop,c("codgeo","epci"))

## Now I want to create separate tables for the Yvelines department (codgeo starts with 78) and for EPCI 002
## I could do it in two lines but I wanted to train using functions so here goes

localisation <- function(code_dep, lib_dep, code_epci, lib_epci){

  do.call("<<-",
          list(paste0(eval(lib_dep),"_pop_ligne"),
               IDF_pop_line %>% filter(stri_sub(codgeo,from=1,length=2)==code_dep)
          )
  )

  do.call("<<-",
          list(paste0(eval(lib_epci),"_pop_ligne"),
               IDF_pop_line %>% filter(epci==code_epci)
          )
  )

}

do.call("localisation",list("78","YVE","002","GPSO"))

With this, I have my 3 tables (IDF_, YVE_, GPSO_) and can now get to the main problem.

What I want to do next is summarise my tables. I'm trying to write a function that would work for all 3 tables.

I'd like it to be fully dependent on the parameter, but it seems that do.call won't accept a paste0 in its second argument.

## Aggregating the tables. I'll call the function 3 times, one for each level.

agregation <- function(lib){

  # This doesn't :

  do.call("<<-",
          list(paste0(eval(lib),"_pop_agr"),
               paste0(eval(lib),"_pop_line") %>%
                 group_by(variable) %>%
                 summarise(pop = sum(value))
          )
  )

}

do.call("agregation",list("IDF")) # This one doesn't work

agregation2 <- function(lib){

  do.call("<<-",
          list(paste0(eval(lib),"_pop_agr"),
               IDF_pop_line %>%
                 group_by(variable) %>%
                 summarise(pop = sum(value))
          )
  )

}

do.call("agregation2",list("IDF")) # This one does

As you can see, the only working way I've found as of now is to write the full name of the table I'm using for aggregation. But this goes against the initial idea of having something that can be freely parametered. How can I modify the first version of my function, in a way that will make it work for all three possible parameters ?

Lastly, I am aware that a simple workaround would have been to keep my IDF_pop_line table and filter at the last moment to create the 3 aggregated tables, but I prefer having separate tables from the get-go.

Thanks in advance for your help !

In your agregation function string paste0(eval(lib),"_pop_line") returns a name of dataframe not dataframe itself. Try get

agregation <- function(lib){

  do.call("<<-",
          list(paste0(eval(lib),"_pop_agr"),
               get(paste0(eval(lib),"_pop_line")) %>%
                 group_by(variable) %>%
                 summarise(pop = sum(value))
          )
  )

}

Here is a suggestion using data.table .

You can use the IDF_pop you create before entering all functions.

library(data.table)

#make adata.table out of YVE_pop_ligne
setDT( IDF_pop )

#create groups to summarise by
IDF_pop[ epci == "002", GSPO := TRUE][]
IDF_pop[ grepl("^78", codgeo) , YVE := TRUE][]

#melt and filter only values where a filter is TRUE
dt <- data.table::melt( IDF_pop, 
                        id.vars = c("codgeo", "epci", "pop0_15", "pop15_64", "pop65p"),
                        measure.vars = c("GSPO", "YVE"))[ value == TRUE,][]

in between result (dt)

#    codgeo epci  pop0_15 pop15_64   pop65p variable value
# 1:  78300  002 6692.394 5441.225 4008.875     GSPO  TRUE
# 2:  78520  002 2128.604 6808.004 1889.822     GSPO  TRUE
# 3:  78300  002 6692.394 5441.225 4008.875      YVE  TRUE
# 4:  78520  002 2128.604 6808.004 1889.822      YVE  TRUE
# 5:  78650  003 8482.971 6556.482 5098.929      YVE  TRUE

code

#now summarising is easy, sum by varianle-group on all pop-columns
dt[, lapply( .SD, sum), by = variable, .SDcols = names(dt)[grepl("^pop", names(dt) )] ]

final output

#    variable   pop0_15 pop15_64   pop65p
# 1:     GSPO  7171.683 5855.894 11866.55
# 2:      YVE 12602.153 8028.948 14364.21

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM