I have two dataframes
df1
time x y state
... ... ... CA
... ... ... MA
... ... ... TX
... ... ... MA
... ... ... CA
... ... ... IL
df2
time x y state
... ... ... MA
... ... ... NY
... ... ... MA
... ... ... TX
... ... ... CA
... ... ... CA
I then have some code where I aggregate the monthly values, rename columns, match data with another list and subsequently merge df1 and df2 into one in about 50 lines of code. Here, I do not consider state
so far.
However, I need to create subsets of the merged dataframe for several US states. Is there a more elegant way other than just copy/pasting the code used for df1 and df2 and replacing df1 and df2 with df1_CA, df2_MA etc.
Loop? Panel data?
One option could be to use the data.table package for the grouped analyses.
# transform your data.frame to data.table
dt1 <- as.data.table(df1)
dt2 <- as.data.table(df2)
# e.g. grouping values on state level
dt1[, sum(y), by=state]
# this will accumulate all y values by state
If you don't want to replace the df name in your code, you could define a function:
# define the function
accumulate <- function(df){
dt <- as.data.table(df)
return(dt[, sum(y), by=state])
}
# and call it
accumulate(df1)
accumulate(df2)
instead of a for loop or similar on all your data.frames, one could use one of the apply functions that iterate effectively through data structures, eg lists
# alternatively define a list of data.frames and then iterate over the list
my.dfs <- list(df1,df2)
lapply(my.dfs, accumulate(df))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.