簡體   English   中英

根據 2 個數據場中的列及其值通過 R 中的變異計算平均值?

[英]Calculate average based on columns in 2 datafarmes and their values via mutate in R?

我有一個數據框結構,用於計算使用此 mutate 函數每月找到的Response.Status的總和:

DF1 <- complete_df %>% 
  mutate(Month = format(as.Date(date, format = "%Y/%m/%d"), "%m/%Y"),
         UNSUBSCRIBE = if_else(UNSUBSCRIBE == "TRUE", "UNSUBSCRIBE", NA_character_)) %>% 
  pivot_longer(c(Response.Status, UNSUBSCRIBE), values_to = "Response.Status") %>% 
  drop_na() %>% 
  count(Month, Response.Status) %>% 
  pivot_wider(names_from = Month, names_sep = "/", values_from = n)




# A tibble: 7 x 16
  Response.Status        `01/2020` `02/2020` `03/2020` `04/2020` `05/2020` `06/2020` `07/2020` `08/2020` `09/2019` `09/2020` `10/2019` `10/2020` `11/2019` `11/2020` `12/2019`
  <chr>                      <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>
1 EMAIL_OPENED                1068      3105      4063      4976      2079      1856      4249      3638       882      4140       865      2573      1167       684       862
2 NOT_RESPONDED               3187      9715     13164     15239      5458      4773     12679     10709      2798     15066      2814      8068      3641      1931      2647
3 PARTIALLY_SAVED                5        34        56         8        28        22        73        86        11        14         7        23         8         8         2
4 SUBMITTED                    216       557       838       828       357       310       654       621       214      1001       233       497       264       122       194
5 SURVEY_OPENED                164       395       597      1016       245       212       513       625       110       588       123       349       202        94       120
6 UNDELIVERED_OR_BOUNCED        92       280       318       260       109       127       319       321        63       445        69       192        93        39        74
7 UNSUBSCRIBE                  397      1011      1472      1568       727       737      1745      2189       372      1451       378       941       429       254       355

我想要做的是根據每個 Response.Status 組中的人數計算在表中創建的這些值的平均值。

structure(list(Response.Status = c("EMAIL_OPENED", "NOT_RESPONDED", 
"PARTIALLY_SAVED", "SUBMITTED", "SURVEY_OPENED", "UNDELIVERED_OR_BOUNCED"
), `01/2020` = c(1068L, 3187L, 5L, 216L, 164L, 92L), `02/2020` = c(3105L, 
9715L, 34L, 557L, 395L, 280L), `03/2020` = c(4063L, 13164L, 56L, 
838L, 597L, 318L), `04/2020` = c(4976L, 15239L, 8L, 828L, 1016L, 
260L), `05/2020` = c(2079L, 5458L, 28L, 357L, 245L, 109L), `06/2020` = c(1856L, 
4773L, 22L, 310L, 212L, 127L), `07/2020` = c(4249L, 12679L, 73L, 
654L, 513L, 319L), `08/2020` = c(3638L, 10709L, 86L, 621L, 625L, 
321L), `09/2019` = c(882L, 2798L, 11L, 214L, 110L, 63L), `09/2020` = c(4140L, 
15066L, 14L, 1001L, 588L, 445L), `10/2019` = c(865L, 2814L, 7L, 
233L, 123L, 69L), `10/2020` = c(2573L, 8068L, 23L, 497L, 349L, 
192L), `11/2019` = c(1167L, 3641L, 8L, 264L, 202L, 93L), `11/2020` = c(684L, 
1931L, 8L, 122L, 94L, 39L), `12/2019` = c(862L, 2647L, 2L, 194L, 
120L, 74L)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", 
"data.frame"))

我制作了一個單獨的表格,其中包含基於這些組名的總和值:

Response.Status
EMAIL_OPENED          : 451  
NOT_RESPONDED         : 1563  
PARTIALLY_SAVED       :   4  
SUBMITTED             :  71  
SURVEY_OPENED         :  53  
UNDELIVERED_OR_BOUNCED:  47
UNSUBSCRIBE: 135

如果我正確理解您的問題,您將有 2 個 data.frame/tibbles。 在“結構”部分中顯示的一個是通知每個響應狀態的人數/用戶數量。 現在您想獲得每個人的價值。 如果是這樣,這是一個可能的解決方案:

# people/users data set
df2 <- data.frame(Response.Status = c("EMAIL_OPENED", "NOT_RESPONDED", "PARTIALLY_SAVED", "SUBMITTED", "SURVEY_OPENED", "UNDELIVERED_OR_BOUNCED", "UNSUBSCRIBE"),
                  PEOPLE = c(451, 1563, 4, 71, 53, 47, 135))
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
df %>% # this is your "structure"
  tidyr::pivot_longer(-Response.Status, names_to = "DATE", values_to = "nmbr") %>% 
  dplyr::group_by(Response.Status) %>% 
  dplyr::summarise(SUM = sum(nmbr)) %>% 
  dplyr::inner_join(df2) %>% 
  dplyr::mutate(MEAN_PP = SUM / PEOPLE)

  Response.Status           SUM PEOPLE MEAN_PP
  <chr>                   <int>  <dbl>   <dbl>
1 EMAIL_OPENED            36207    451    80.3
2 NOT_RESPONDED          111889   1563    71.6
3 PARTIALLY_SAVED           385      4    96.2
4 SUBMITTED                6906     71    97.3
5 SURVEY_OPENED            5353     53   101  
6 UNDELIVERED_OR_BOUNCED   2801     47    59.6

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM