简体   繁体   中英

Using summarise with weighted mean from dplyr in R

I'm trying to tidy a dataset, using dplyr. My variables contain percentages and straightforward values (in this case, page views and bounce rates). I've tried to summarize them this way:

require(dplyr)
df<-df%>%
   group_by(pagename)%>%
   summarise(pageviews=sum(pageviews), bounceRate= weighted.mean(bounceRate,pageviews))

But this returns:

 Error: 'x' and 'w' must have the same length

My dataset does not have any NA's in the both the page views and the bounce rates. I'm not sure what I'm doing wrong, maybe summarise() doesn't work with weighted.mean() ?

EDIT

I've added some data:

### Source: local data frame [4 x 3]

###               pagename bounceRate pageviews
                    (chr)      (dbl)     (dbl)
###1                url1   72.22222      1176
###2                url2   46.42857       733
###3                url2   76.92308       457
###4                url3   62.06897       601

The summarize() command replaces variables in the order they appear in the command, so because you are changing the value of pageviews, that new value is being used in the weighted.mean. It's safer to use different names

df %>%
   group_by(pagename)%>%
   summarise(pageviews_sum = sum(pageviews), 
      bounceRate_mean = weighted.mean(bounceRate,pageviews))

And if you really want, you can rename afterward

df %>%
   group_by(pagename) %>%
   summarise(pageviews_sum = sum(pageviews), 
      bounceRate_mean = weighted.mean(bounceRate,pageviews)) %>% 
   rename(pageviews = pageviews_sum, bounceRate = bounceRate_mean)

I've found the solution. Since summarise(pageviews=sum(pageviews) is evaluated before bounceRate= weighted.mean(bounceRate,pageviews) , the length of pageviews is reduced and therefore shorter than bounceRate , which triggers the error.

The solution is simple, just switch them:

require(dplyr)
df<-df%>%
  group_by(pagename)%>%
  summarise(bounceRate= weighted.mean(bounceRate,pageviews),pageviews=sum(pageviews))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM