簡體   English   中英

您如何在 dataframe 中按日期/州取平均值?

[英]How do you take average values by date/state in a dataframe?

我有一個這樣的數據框(標題):

   state start_date   end_date    created_at cycle party answer candidate_name  pct survey_length
1      Florida 2020-11-02 2020-11-02 6/14/21 15:36  2020   REP  Trump   Donald Trump 48.0        0 days
2         Iowa 2020-11-01 2020-11-02 11/2/20 09:02  2020   REP  Trump   Donald Trump 48.0        1 days
3 Pennsylvania 2020-11-01 2020-11-02 11/2/20 12:49  2020   REP  Trump   Donald Trump 49.2        1 days
4      Florida 2020-11-01 2020-11-02 11/2/20 19:02  2020   REP  Trump   Donald Trump 48.2        1 days
5      Florida 2020-10-31 2020-11-02 11/4/20 09:17  2020   REP  Trump   Donald Trump 49.4        2 days
6       Nevada 2020-10-31 2020-11-02 11/4/20 10:38  2020   REP  Trump   Donald Trump 49.1        2 days

我想通過 state 取每個月“pct”列的平均值。你會怎么做? 你會使用for循環嗎?

這是group_bysummarize的解決方案。

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# simulated data
df <- expand_grid(
  state = c("fl", "io", "pa", "nv"),
  start_date = seq(mdy("1/1/2022"), by = "day", length.out = 300),
) %>% 
  mutate(pct = runif(nrow(.)))

# mean pct by month
df %>%
  mutate(mnth = floor_date(start_date, unit = "month")) %>%
  group_by(state, mnth) %>%
  summarize(pct = mean(pct), .groups = "drop")
#> # A tibble: 40 x 3
#>    state mnth         pct
#>    <chr> <date>     <dbl>
#>  1 fl    2022-01-01 0.443
#>  2 fl    2022-02-01 0.529
#>  3 fl    2022-03-01 0.570
#>  4 fl    2022-04-01 0.583
#>  5 fl    2022-05-01 0.477
#>  6 fl    2022-06-01 0.499
#>  7 fl    2022-07-01 0.497
#>  8 fl    2022-08-01 0.561
#>  9 fl    2022-09-01 0.467
#> 10 fl    2022-10-01 0.437
#> # ... with 30 more rows

reprex package (v2.0.1) 創建於 2022-03-14

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM