I'm progressively transitioning from SAS to R, and at the moment I am trying to replicate what I used to do with macros.
I have a table that contains all my data (let's call it IDF_pop) and from this table I create two other : YVE_pop and EPCI_pop, which are two subsets from the main table. I prefer creating separate tables, but I guess this might not be optimal. Here's how I proceed :
## Let's say the main table contains 10 lines.
## codgeo is the city's postal code, epci is the area, and I have three
## variables that describe different parts of the population
codgeo <- c("75014","75020","78300","78520","78650","91200","91600","92500","93100","95230")
epci <- c("001","001","002","002","003","004","004","005","006","007")
pop0_15 <- c(10000*runif(10))
pop15_64 <- c(10000*runif(10))
pop65p <- c(10000*runif(10))
IDF_pop <- data.frame(codgeo,epci,pop0_15,pop15_64,pop65p)
## I'd like my population to be in one single column, for this I'll use melt
IDF_pop_line <- melt(IDF_pop,c("codgeo","epci"))
## Now I want to create separate tables for the Yvelines department (codgeo starts with 78) and for EPCI 002
## I could do it in two lines but I wanted to train using functions so here goes
localisation <- function(code_dep, lib_dep, code_epci, lib_epci){
do.call("<<-",
list(paste0(eval(lib_dep),"_pop_ligne"),
IDF_pop_line %>% filter(stri_sub(codgeo,from=1,length=2)==code_dep)
)
)
do.call("<<-",
list(paste0(eval(lib_epci),"_pop_ligne"),
IDF_pop_line %>% filter(epci==code_epci)
)
)
}
do.call("localisation",list("78","YVE","002","GPSO"))
With this, I have my 3 tables (IDF_, YVE_, GPSO_) and can now get to the main problem.
What I want to do next is summarise my tables. I'm trying to write a function that would work for all 3 tables.
I'd like it to be fully dependent on the parameter, but it seems that do.call won't accept a paste0 in its second argument.
## Aggregating the tables. I'll call the function 3 times, one for each level.
agregation <- function(lib){
# This doesn't :
do.call("<<-",
list(paste0(eval(lib),"_pop_agr"),
paste0(eval(lib),"_pop_line") %>%
group_by(variable) %>%
summarise(pop = sum(value))
)
)
}
do.call("agregation",list("IDF")) # This one doesn't work
agregation2 <- function(lib){
do.call("<<-",
list(paste0(eval(lib),"_pop_agr"),
IDF_pop_line %>%
group_by(variable) %>%
summarise(pop = sum(value))
)
)
}
do.call("agregation2",list("IDF")) # This one does
As you can see, the only working way I've found as of now is to write the full name of the table I'm using for aggregation. But this goes against the initial idea of having something that can be freely parametered. How can I modify the first version of my function, in a way that will make it work for all three possible parameters ?
Lastly, I am aware that a simple workaround would have been to keep my IDF_pop_line table and filter at the last moment to create the 3 aggregated tables, but I prefer having separate tables from the get-go.
Thanks in advance for your help !
In your agregation
function string paste0(eval(lib),"_pop_line")
returns a name of dataframe not dataframe itself. Try get
agregation <- function(lib){
do.call("<<-",
list(paste0(eval(lib),"_pop_agr"),
get(paste0(eval(lib),"_pop_line")) %>%
group_by(variable) %>%
summarise(pop = sum(value))
)
)
}
Here is a suggestion using data.table
.
You can use the IDF_pop
you create before entering all functions.
library(data.table)
#make adata.table out of YVE_pop_ligne
setDT( IDF_pop )
#create groups to summarise by
IDF_pop[ epci == "002", GSPO := TRUE][]
IDF_pop[ grepl("^78", codgeo) , YVE := TRUE][]
#melt and filter only values where a filter is TRUE
dt <- data.table::melt( IDF_pop,
id.vars = c("codgeo", "epci", "pop0_15", "pop15_64", "pop65p"),
measure.vars = c("GSPO", "YVE"))[ value == TRUE,][]
in between result (dt)
# codgeo epci pop0_15 pop15_64 pop65p variable value
# 1: 78300 002 6692.394 5441.225 4008.875 GSPO TRUE
# 2: 78520 002 2128.604 6808.004 1889.822 GSPO TRUE
# 3: 78300 002 6692.394 5441.225 4008.875 YVE TRUE
# 4: 78520 002 2128.604 6808.004 1889.822 YVE TRUE
# 5: 78650 003 8482.971 6556.482 5098.929 YVE TRUE
code
#now summarising is easy, sum by varianle-group on all pop-columns
dt[, lapply( .SD, sum), by = variable, .SDcols = names(dt)[grepl("^pop", names(dt) )] ]
final output
# variable pop0_15 pop15_64 pop65p
# 1: GSPO 7171.683 5855.894 11866.55
# 2: YVE 12602.153 8028.948 14364.21
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.