简体   繁体   中英

R - Aggregate data with variable name using dcast

Python guy new to R , so forgive the naive question.

I have an R dataframe named metrics with four columns:

I want to pass the level of aggregation ( day or week ) as a variable to dcast for aggregation.

agg_level <- c("week")

If I hard-code week in the in the function, it aggregates data for each week correctly:

  • met <- dcast(metrics, week ~ city, value.var = count, fun.aggregate = sum)
  • Output:

week NYC CHI SF

2015-10-18 1 2 3

2015-10-25 4 5 6

If I replace week with the variable, it fails. (It aggregates data for all weeks.)

  • met <- dcast(metrics, agg_level ~ city, value.var = count, fun.aggregate = sum)

  • Output:

agg_level NYC CHI SF

week 5 7 9

Based on this , metrics[[agg_level]] extracts a column from variable, but this fails:

  • met <- dcast(m, [[agg_level]] ~ city, value.var = metric, fun.aggregate = sum)

  • Error in (function ... unexpected '[['

What is the correct way to do this?

The formula argument of dcast expects that the words passed to it are column/variable names inside of the data.frame x. It does not recognize or resolve the fact that "agg_level" is a variable. As such, you have two options:

# Option 1
# Do some text operations to make the formula based on variables.
if(this==that) {agg_level <- 'week'} else {agg_level <- 'day'}
myFormula <- sprintf("%s ~ city", agg_level)
met <- dcast(metrics, as.formula(myFormula), sum, value.var = metric)

# Option 2 - Untested
# Take advantage of dcast's alternative to the formula notation and pass a list instead.
# No idea if this will work.
met <- dcast(metrics, list(.(agg_level),.(city)), sum, value.var=metric)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM