简体   繁体   中英

Why R doesn't read my columns as numeric?

I have a data frame grouped by hours and days, with three attributes: calories, steps and intensity. All of them are in int type format, I've already checked a lot of times with glimpse() . I want to build a plot with this data frame, but the plot doesn't change if I change the attributes. I added a new fill based on intensity and I found the problem; ggplot2 counts the number of rows and for this the plot never change.

Here a dput() of the data frame, only tidyverse package is necessary :

structure(list(id = c(7007744171, 2347167796, 8053475328, 8877689391, 
8877689391, 7007744171, 8053475328, 7086361926, 8053475328, 8877689391, 
7007744171, 8053475328, 8053475328, 8253242879, 7086361926, 8053475328, 
8877689391, 2022484408, 8053475328, 8053475328), hour = c(8, 
8, 19, 17, 18, 8, 19, 17, 19, 16, 8, 19, 21, 10, 13, 14, 12, 
9, 19, 14), day = structure(c(1L, 6L, 4L, 3L, 3L, 3L, 6L, 3L, 
3L, 4L, 4L, 2L, 7L, 7L, 2L, 4L, 2L, 1L, 5L, 7L), .Label = c("Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"
), class = "factor"), calories = c(353L, 317L, 413L, 505L, 497L, 
336L, 379L, 512L, 357L, 397L, 293L, 335L, 334L, 251L, 279L, 330L, 
353L, 338L, 323L, 321L), steps = c(4904L, 4752L, 4706L, 4606L, 
4328L, 4247L, 4127L, 4089L, 3794L, 3705L, 3660L, 3553L, 3451L, 
3440L, 3401L, 3396L, 3387L, 3322L, 3302L, 3280L), intensity = c(138L, 
117L, 121L, 107L, 107L, 123L, 101L, 143L, 91L, 72L, 105L, 87L, 
87L, 71L, 81L, 79L, 82L, 99L, 83L, 86L), status = structure(c(2L, 
2L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L), .Label = c("Sedentary User", "Light User", "Heavy User"
), class = "factor")), row.names = c(NA, 20L), class = "data.frame")

Here is the code of the plot:

ggplot(data=week_hourly,
   mapping=aes(x=hour, y=intensity, fill = intensity, alpha=hour)) +
  geom_col() + 
  coord_flip() +  
  scale_fill_gradient(low = "#8a2380", high = "#f27121") + 
  scale_alpha(range=c(0.7,1), guide="none") + 
  labs(title="Intensity per Hour", 
  subtitle="Through the week", x="Hour", y= "Intensity") +
  theme(legend.position = "top") +
  scale_x_continuous(breaks=seq(0,23,4)) + 
  facet_grid(status~day)

And here is the result:
在此处输入图片说明

As you can see the fill doesn't count the unique value of intensity, and in the axis X the scale is until 400 when the max value on intensity is of 165. I've already tried convert the columns with as.integer and as.numeric and other methods, but nothing helps.

@MrFlick diagnosed the problem correctly:

The problem looks like you have not summarized your data before plotting and you have multiple values per day/hour and values are being stacked.

One sensible way to summarize your data (ie, collapse all the intensity measurements for a particular status/day/hour combination to a single value) would be

library(tidyverse)
wh_sum <- week_hourly %>% 
          group_by(status, day, hour) %>% 
          summarise(across(intensity, mean))

You could probably also do this on the fly with stat_summary() :

ggplot(data=week_hourly,
   mapping=aes(x=hour, y=intensity, fill = intensity, alpha=hour)) +
  stat_summary(fun.y = mean, geom = "col") + ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM