简体   繁体   English

数据表 R 中按日期列的每日平均值

[英]Daily mean by date-column in data table R

I have the following data table:我有以下数据表:

DT <- structure(list(date = structure(c(18628, 18628, 18628, 18628, 
18628, 18628, 18628, 18628, 18628, 18628, 18628, 18628, 18628, 
18628, 18628, 18628, 18628, 18628, 18628, 18628, 18628, 18628, 
18628, 18628, 18629, 18629, 18629, 18629, 18629, 18629, 18629, 
18629, 18629, 18629, 18629, 18629, 18629, 18629, 18629, 18629, 
18629, 18629, 18629, 18629, 18629, 18629, 18629, 18629, 18630, 
18630, 18630, 18630, 18630, 18630, 18630, 18630, 18630, 18630, 
18630, 18630, 18630, 18630, 18630, 18630, 18630, 18630, 18630, 
18630, 18630, 18630, 18630, 18630, 18631, 18631, 18631, 18631, 
18631, 18631, 18631, 18631, 18631, 18631, 18631, 18631, 18631, 
18631, 18631, 18631, 18631, 18631), class = "Date"), Germany = c("50,87", 
"48,19", "44,68", "42,92", "40,39", "40,2", "39,63", "40,09", 
"41,27", "44,88", "45", "47,2", "50,78", "45,49", "44,73", "46,59", 
"52,99", "60,26", "60,61", "60,36", "57,4", "53,86", "53,45", 
"49,72", "46,69", "42,43", "41,09", "40", "37,55", "39", "42,09", 
"44,96", "48,45", "52", "52", "52,15", "55,95", "52", "50,69", 
"53,45", "59,99", "62", "63,08", "62,17", "60,03", "55,03", "52,25", 
"48,45", "46,11", "43", "39,55", "35,18", "33,45", "32,37", "31,7", 
"32,63", "36,9", "36,96", "36,96", "43,72", "47,71", "40,41", 
"39,66", "39,57", "36,11", "45,04", "52,56", "45,84", "35,83", 
"33,31", "34,98", "27,39", "29,33", "24,82", "24,65", "24,8", 
"27,71", "28,58", "37,04", "52,03", "55,76", "57,06", "57,18", 
"60", "61,27", "60,28", "60,07", "59,46", "61,99", "66,82"), 
    year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
    2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
    2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
    2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
    2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
    2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
    2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
    2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
    2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
    2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
    2021L, 2021L)), class = c("data.table", "data.frame"), row.names = c(NA, 
-90L))

Now I want to calculate the daily mean of column Germany by the date -column.现在我想按date列计算Germany列的每日平均值。 When using this:使用时:

dt.mean <- DT[, .(DE = mean(Germany)), by = "date"]

it gives me the following error:它给了我以下错误:

Error in gmean(Germany) : 
  Type 'character' not supported by GForce sum (gsum). Either add the prefix base::sum(.) or turn off GForce optimization using options(datatable.optimize=1)

For this data table the daily mean of 2021-01-01 is 48.3983333 and for 2021-01-02 is 50.5625 .对于此数据表, 2021-01-02的每日平均值为2021-01-01的每日平均值50.5625 48.3983333

How can I solve this?我该如何解决这个问题?

As Maël suggests, you need to convert the string characters to numeric.正如 Maël 建议的那样,您需要将字符串字符转换为数字。 You cna do this and compute the mean in data.table as follows:您可以这样做并计算 data.table 中的平均值,如下所示:

dt.mean <- DT[, .(DE = mean(as.numeric(gsub(",", ".", Germany)))), by = "date"]

dt.mean

         date       DE
1: 2021-01-01 48.39833
2: 2021-01-02 50.56250
3: 2021-01-03 38.62250
4: 2021-01-04 47.15833

Using dplyr :使用dplyr

 DT %>% 
      mutate(Germany = as.numeric(gsub(",", ".", gsub("\\.", "", Germany)))) %>%
      group_by(date) %>%
      dplyr::summarize(meanDay = mean(Germany, na.rm=T)) %>% 
      as.data.frame()

        date  meanDay
1 2021-01-01 48.39833
2 2021-01-02 50.56250
3 2021-01-03 38.62250
4 2021-01-04 47.15833

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM