简体   繁体   English

使用r,如何将不是基本聚合函数的函数(moments :: skewness)应用于分组表?

[英]Using r, how can I apply a function (moments::skewness) that is not a basic aggregate function to a group-by table?

I have in Redshift a set of crowdsourced weather data: many stations, each many days, each day 20 to 100 observations, each observation several variables. 我在Redshift中有一组众包的天气数据:许多站点,每隔几天,每天20到100个观测值,每个观测值有几个变量。 I am computing daily statistics. 我正在计算每日统计数据。 It works like this; 它是这样工作的;

dailyn<-cwoparchive %>%
filter(stationname=="EW2020" %>%
group_by (archivedate) %>%
summarise(ncount=n(),meanlat=mean(latitude),meanlon=mean(longitude)) %>%
collect() %>%
data.frame ()

returns this, just what I want: 返回此,正是我想要的:

  • archivedate ncount meanlat meanlon 存档日期ncount平均值平均值
  • 1 2013-02-06 2 38.82667 -76.79884 1 2013-02-06 2 38.82667 -76.79884
  • 2 2013-03-19 22 38.82700 -76.79816 2 2013-03-19 22 38.82700 -76.79816
  • 3 2013-03-21 45 38.82700 -76.79816 3 2013-03-21 45 38.82700 -76.79816
  • 4 2013-03-22 49 38.82699 -76.79819 4 2013-03-22 49 38.82699 -76.79819
  • 5 2013-03-24 63 38.82690 -76.79836 5 2013-03-24 63 38.82690 -76.79836
  • 6 2013-03-27 62 38.82691 -76.79834 6 2013-03-27 62 38.82691 -76.79834
  • 7 2013-03-28 48 38.82700 -76.79816 7 2013-03-28 48 38.82700 -76.79816
  • 8 2013-03-29 45 38.82700 -76.79816 8 2013-03-29 45 38.82700 -76.79816
  • 9 2013-03-30 39 38.82700 -76.79816 9 2013-03-30 39 38.82700 -76.79816
  • 10 2013-04-01 49 38.82697 -76.79823 etc. 10 2013-04-01 49 38.82697 -76.79823等

Next I want to know if mornings are sunnier. 接下来,我想知道早晨是否晴天。 But when I do the same thing as above, only replacing that summarise line with this one: 但是,当我执行与上述相同的操作时,仅用该内容替换该摘要行:

summarise(askew=skewness(linterpreted))

I get an error message, "Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not Retrieve the result : ERROR: function skewness(integer) does not exist HINT: No function matches the given name and argument types. You may need to add explicit type casts." Yet the skewness function works fine on the same data in normal, non-grouped data frames. 我收到错误消息,“ postgresqlExecStatement(conn,statement,...)中的错误:RS-DBI驱动程序:(无法检索结果:错误:函数偏斜度(整数)不存在)提示:没有函数与给定名称匹配和参数类型。您可能需要添加显式类型转换。”但是,偏度函数在正常的非分组数据帧中的相同数据上可以正常工作。

With apologies for bothering everybody, I worked it out. 道歉打扰大家,我解决了。 First download, secondly group-by in a follow-on step - this seems irrational but it is necessary, and then summarise on that. 首先下载,然后在后续步骤中进行分组-这似乎不合理,但这是必要的,然后在此进行总结。

stationdata<-cwoparchive %>%
filter(stationname=="EW2020") %>%
collect() %>%
data.frame ()

station_by_day <- group_by(stationdata, archivedate)

skew_by_day <- summarise(station_by_day, count = n(),askew=skewness(linterpreted))

produces the ideal result: 产生理想的结果:

  • archivedate count askew 存档日期计数歪斜
    • (date) (int) (dbl) (日期)(int)(dbl)
    • 1 2013-02-06 2 0.0000000 1 2013-02-06 2 0.0000000
    • 2 2013-03-01 5 -0.3755537 2 2013-03-01 5 -0.3755537
    • 3 2013-03-19 22 -0.2498925 3 2013-03-19 22 -0.2498925
    • 4 2013-03-20 38 -0.3328628 4 2013-03-20 38 -0.3328628
    • 5 2013-03-21 45 0.7237873 etc. 5 2013-03-21 45 0.7237873等

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM