简体   繁体   English

错误:仅在具有所有数值变量且在大型数据集上具有ddply的数据框上定义

[英]Error: only defined on a data frame with all numeric variables with ddply on large dataset

I'm trying to calculate sums and means on a very large dataset (~22000 records) for several parameters (eg Er_Count, Mn_Count) by month, year , Survey ID and Grid ID. 我正在尝试按月,年,调查ID和网格ID在几个参数(例如Er_Count,Mn_Count)的非常大的数据集(〜22000条记录)上计算总和和均值。 I tried this code initially to get overall sums: 我最初尝试使用此代码来获得总和:

dlply(Effort_All,c("Er_Count","Mn_Count","Bp_Count"),sum)

And received the following error: Error: only defined on a data frame with all numeric variables Since I cannot even get overall sums, I am unable to get statistics by the specific variables either. 并收到以下错误:错误:仅在具有所有数字变量的数据帧上定义由于我什至无法获得总和,因此我也无法通过特定变量来获取统计信息。 Do I need to split the data in some manner? 我是否需要以某种方式拆分数据?

I have included a sample dataset of 25 records below. 我在下面包括了25条记录的样本数据集。

    structure(list(Grid_ID = structure(c(527L, 92L, 331L, 395L, 934L, 
    93L), .Label = c("1", "1,000", "1,001", "1,002", "1,003", "1,004", 
"1,005", "1,006", "1,007", "1,008", "1,009", "1,010", "1,011", 
"1,012", "1,013", "1,014", "1,015", "1,016", "1,017", "1,018", 
"1,019", "1,020", "1,021", "1,022", "1,023", "1,024", "1,025", 
"1,026", "1,027", "1,028", "1,029", "1,030", "1,031", "1,032", 
"1,033", "1,034", "1,035", "1,036", "1,037", "1,038", "1,039", 
"1,040", "1,041", "1,042", "1,043", "1,044", "1,045", "1,046", 
"1,047", "1,048", "1,049", "1,050", "1,051", "1,052", "1,053", 
"1,054", "1,055", "1,056", "1,057", "1,058", "1,059", "1,060", 
"1,061", "10", "100", "101", "102", "103", "104", "105", "106", 
"107", "108", "109", "11", "110", "111", "112", "113", "114", 
"115", "116", "117", "118", "119", "12", "120", "121", "122", 
"123", "124", "125", "126", "127", "128", "129", "13", "130", 
"131", "132", "133", "134", "135", "136", "137", "138", "139", 
"14", "140", "141", "142", "143", "144", "145", "146", "147", 
"148", "149", "15", "150", "151", "152", "153", "154", "155", 
"156", "157", "158", "159", "16", "160", "161", "162", "163", 
"164", "165", "166", "167", "168", "169", "17", "170", "171", 
"172", "173", "174", "175", "176", "177", "178", "179", "18", 
"180", "181", "182", "183", "184", "185", "186", "187", "188", 
"189", "19", "190", "191", "192", "193", "194", "195", "196", 
"197", "198", "199", "2", "20", "200", "201", "202", "203", "204", 
"205", "206", "207", "208", "209", "21", "210", "211", "212", 
"213", "214", "215", "216", "217", "218", "219", "22", "220", 
"221", "222", "223", "224", "225", "226", "227", "228", "229", 
"23", "230", "231", "232", "233", "234", "235", "236", "237", 
"238", "239", "24", "240", "241", "242", "243", "244", "245", 
"246", "247", "248", "249", "25", "250", "251", "252", "253", 
"254", "255", "256", "257", "258", "259", "26", "260", "261", 
"262", "263", "264", "265", "266", "267", "268", "269", "27", 
"270", "271", "272", "273", "274", "275", "276", "277", "278", 
"279", "28", "280", "281", "282", "283", "284", "285", "286", 
"287", "288", "289", "29", "290", "291", "292", "293", "294", 
"295", "296", "297", "298", "299", "3", "30", "300", "301", "302", 
"303", "304", "305", "306", "307", "308", "309", "31", "310", 
"311", "312", "313", "314", "315", "316", "317", "318", "319", 
"32", "320", "321", "322", "323", "324", "325", "326", "327", 
"328", "329", "33", "330", "331", "332", "333", "334", "335", 
"336", "337", "338", "339", "34", "340", "341", "342", "343", 
"344", "345", "346", "347", "348", "349", "35", "350", "351", 
"352", "353", "354", "355", "356", "357", "358", "359", "36", 
"360", "361", "362", "363", "364", "365", "366", "367", "368", 
"369", "37", "370", "371", "372", "373", "374", "375", "376", 
"377", "378", "379", "38", "380", "381", "382", "383", "384", 
"385", "386", "387", "388", "389", "39", "390", "391", "392", 
"393", "394", "395", "396", "397", "398", "399", "4", "40", "400", 
"401", "402", "403", "404", "405", "406", "407", "408", "409", 
"41", "410", "411", "412", "413", "414", "415", "416", "417", 
"418", "419", "42", "420", "421", "422", "423", "424", "425", 
"426", "427", "428", "429", "43", "430", "431", "432", "433", 
"434", "435", "436", "437", "438", "439", "44", "440", "441", 
"442", "443", "444", "445", "446", "447", "448", "449", "45", 
"450", "451", "452", "453", "454", "455", "456", "457", "458", 
"459", "46", "460", "461", "462", "463", "464", "465", "466", 
"467", "468", "469", "47", "470", "471", "472", "473", "474", 
"475", "476", "477", "478", "479", "48", "480", "481", "482", 
"483", "484", "485", "486", "487", "488", "489", "49", "490", 
"491", "492", "493", "494", "495", "496", "497", "498", "499", 
"5", "50", "500", "501", "502", "503", "504", "505", "506", "507", 
"508", "509", "51", "510", "511", "512", "513", "514", "515", 
"516", "517", "518", "519", "52", "520", "521", "522", "523", 
"524", "525", "526", "527", "528", "529", "53", "530", "531", 
"532", "533", "534", "535", "536", "537", "538", "539", "54", 
"540", "541", "542", "543", "544", "545", "546", "547", "548", 
"549", "55", "550", "551", "552", "553", "554", "555", "556", 
"557", "558", "559", "56", "560", "561", "562", "563", "564", 
"565", "566", "567", "568", "569", "57", "570", "571", "572", 
"573", "574", "575", "576", "577", "578", "579", "58", "580", 
"581", "582", "583", "584", "585", "586", "587", "588", "589", 
"59", "590", "591", "592", "593", "594", "595", "596", "597", 
"598", "599", "6", "60", "600", "601", "602", "603", "604", "605", 
"606", "607", "608", "609", "61", "610", "611", "612", "613", 
"614", "615", "616", "617", "618", "619", "62", "620", "621", 
"622", "623", "624", "625", "626", "627", "628", "629", "63", 
"630", "631", "632", "633", "634", "635", "636", "637", "638", 
"639", "64", "640", "641", "642", "643", "644", "645", "646", 
"647", "648", "649", "65", "650", "651", "652", "653", "654", 
"655", "656", "657", "658", "659", "66", "660", "661", "662", 
"663", "664", "665", "666", "667", "668", "669", "67", "670", 
"671", "672", "673", "674", "675", "676", "677", "678", "679", 
"68", "680", "681", "682", "683", "684", "685", "686", "687", 
"688", "689", "69", "690", "691", "692", "693", "694", "695", 
"696", "697", "698", "699", "7", "70", "700", "701", "702", "703", 
"704", "705", "706", "707", "708", "709", "71", "710", "711", 
"712", "713", "714", "715", "716", "717", "718", "719", "72", 
"720", "721", "722", "723", "724", "725", "726", "727", "728", 
"729", "73", "730", "731", "732", "733", "734", "735", "736", 
"737", "738", "739", "74", "740", "741", "742", "743", "744", 
"745", "746", "747", "748", "749", "75", "750", "751", "752", 
"753", "754", "755", "756", "757", "758", "759", "76", "760", 
"761", "762", "763", "764", "765", "766", "767", "768", "769", 
"77", "770", "771", "772", "773", "774", "775", "776", "777", 
"778", "779", "78", "780", "781", "782", "783", "784", "785", 
"786", "787", "788", "789", "79", "790", "791", "792", "793", 
"794", "795", "796", "797", "798", "799", "8", "80", "800", "801", 
"802", "803", "804", "805", "806", "807", "808", "809", "81", 
"810", "811", "812", "813", "814", "815", "816", "817", "818", 
"819", "82", "820", "821", "822", "823", "824", "825", "826", 
"827", "828", "829", "83", "830", "831", "832", "833", "834", 
"835", "836", "837", "838", "839", "84", "840", "841", "842", 
"843", "844", "845", "846", "847", "848", "849", "85", "850", 
"851", "852", "853", "854", "855", "856", "857", "858", "859", 
"86", "860", "861", "862", "863", "864", "865", "866", "867", 
"868", "869", "87", "870", "871", "872", "873", "874", "875", 
"876", "877", "878", "879", "88", "880", "881", "882", "883", 
"884", "885", "886", "887", "888", "889", "89", "890", "891", 
"892", "893", "894", "895", "896", "897", "898", "899", "9", 
"90", "900", "901", "902", "903", "904", "905", "906", "907", 
"908", "909", "91", "910", "911", "912", "913", "914", "915", 
"916", "917", "918", "919", "92", "920", "921", "922", "923", 
"924", "925", "926", "927", "928", "929", "93", "930", "931", 
"932", "933", "934", "935", "936", "937", "938", "939", "94", 
"940", "941", "942", "943", "944", "945", "946", "947", "948", 
"949", "95", "950", "951", "952", "953", "954", "955", "956", 
"957", "958", "959", "96", "960", "961", "962", "963", "964", 
"965", "966", "967", "968", "969", "97", "970", "971", "972", 
"973", "974", "975", "976", "977", "978", "979", "98", "980", 
"981", "982", "983", "984", "985", "986", "987", "988", "989", 
"99", "990", "991", "992", "993", "994", "995", "996", "997", 
"998", "999"), class = "factor"), ER_Groups = c(2, 2, 2, 3, 5, 
6), Er_Count = c(60, 75, 14, 12, 8, 26), Mn_Count = c(30, 9, 6, 33, 
7, 12), Bp_Groups = c(1, 2, 1, 1, 0, 1), Bp_Count = c(3, 3, 2, 
5, 0, 6), Mn_Groups = c(1, 1, 3, 1, 0, 0), Month = c(10L, 6L, 
12L, 4L, 2L, 4L), Year = c(2000L, 2001L, 2009L, 2004L, 2002L, 
2001L), SurveyID = structure(c(16L, 24L, 93L, 56L, 34L, 22L), .Label = c("199708HS", 
"199808HS", "199908HS", "199909SSLQ", "199910SSL", "199911SSL", 
"200001SSLQ", "200002SSL", "200003SSLQ", "200004SSLQ", "200005SSL", 
"200006SSL", "200007SSL", "200008HS", "200008SSL", "200009SSL", 
"200010SSL", "200011SSL", "200101SSL", "200102SSL", "200103SSL", 
"200104SSL", "200105SSL", "200106SSL", "200107SSL", "200108HS", 
"200108SSL", "200109SSL", "200110SSL", "200111SSL", "200112SSL", 
"200201SSL", "200202SSL", "200203SSL", "200204SSL", "200205SSL", 
"200206SSL", "200207SSL", "200208HS", "200208SSL", "200210SSL", 
"200211SSL", "200212SSL", "200301SSL", "200302SSL", "200303SSL", 
"200304SSL", "200305SSL", "200306SSL", "200307SSL", "200309SSL", 
"200310SSL", "200311SSL", "200312SSL", "200403SSL", "200404SSL", 
"200405SSL", "200406SSL", "200407SSL", "200408HS", "200408SSL", 
"200409SSL", "200505SSL", "200506SSL", "200507SSL", "200510SSL", 
"200512SSL", "200603SSL", "200609SSL", "200612SSL", "200709GAP07", 
"200710GAP07", "200712GAP07", "200802GAP07", "200803GAP07", "200804GAP07", 
"200805GAP07", "200806GAP07", "200807GAP07", "200808GAP07", "200809GAP08", 
"200810GAP08", "200812GAP08", "200901GAP08", "200903GAP08", "200904GAP08", 
"200905GAP08", "200906GAP08", "200907GAP08", "200908GAP08", "200909GAP08", 
"200910GAP09", "200912GAP09", "201001GAP09", "201002GAP09", "201003GAP09", 
"201004GAP09", "201005GAP09", "201006GAP09", "201007GAP09", "201008GAP09", 
"201009GAP09", "201010GAP09", "201011GAP09", "201101GAP09", "201102GAP09", 
"201103GAP09", "201104GAP09", "201106GAP09", "201108GAP09", "201109GAP09", 
"201111GAP09", "201201GAP09", "201203GAP09", "201205GAP09", "201207GAP09", 
"201208GAP09", "201211GAP09", "201301GAP09", "201303GAP09", "201305GAP09", 
"201307GAP09", "201309GAP09", "201311GAP09"), class = "factor"), 
    Er_Group_Density = c(4, 9, 12, 4, 1, 0), Mn_Group_Density = c(3, 
    1, 1, 1, 0, 2), Bp_Group_Density = c(1, 2, 1, 0, 1, 0), Er_Count_Density = c(50, 
    14, 12, 9, 6, 4), Mn_Count_Density = c(9, 5, 2, 3, 2, 0), Bp_Count_Density = c(2, 
    3, 0, 4, 1, 0)), .Names = c("Grid_ID", "ER_Groups", "Er_Count", 
"Mn_Count", "Bp_Groups", "Bp_Count", "Mn_Groups", "Month", "Year", 
"SurveyID", "Er_Group_Density", "Mn_Group_Density", "Bp_Group_Density", 
"Er_Count_Density", "Mn_Count_Density", "Bp_Count_Density"), row.names = c(2770L, 
4421L, 17348L, 11263L, 6736L, 3974L), class = "data.frame")

There are a number of ways to get statistics by group. 有多种方法可以按组获取统计信息。 I'll assume you have a bias for plyr, since your example uses it. 我假设您对plyr有偏见,因为您的示例使用了它。

Remember that dlply() splits the data into smaller dataframes by the grouping variables, then it applies the requested function to each of the smaller dataframes. 请记住,dlply()通过分组变量将数据拆分为较小的数据帧,然后将请求的函数应用于每个较小的数据帧。 Therefore the function you pass should operate on a whole dataframe. 因此,您传递的函数应在整个数据帧上运行。 sum() does not do this. sum()不会这样做。 You can write your own function, though. 不过,您可以编写自己的函数。

Based on your description, what you want is something like this 根据您的描述,您想要的是这样的

myfun <- function(x) colSums(x[, c("Er_Count", "Mn_Count", "Bp_Count")])
dlply(Effort_All, c("Month", "Year", "Grid_ID", "SurveyID"), myfun)

Remember that the second argument to dlply() is the set of variables used for grouping. 请记住,dlply()的第二个参数是用于分组的一组变量。 Not sure why you want the output as a list. 不知道为什么要将输出显示为列表。 Would it be easier to read if you used ddply (with the same arguments)? 如果您使用ddply(具有相同的参数)会更容易阅读吗?

Other approaches include using sqldf() or something like lapply(). 其他方法包括使用sqldf()或类似lapply()的方法。

=============== EDIT: Other approaches ============= ==============编辑:其他方法==============

sqldf is always very easy to read and understand: sqldf总是很容易阅读和理解:

output <- sqldf('select Month,Year,Grid_ID,SurveyID,
                        sum(Er_Count) as ercount, 
                        sum(Mn_Count) as mncount,
                        sum(Bp_Count) as bpcount
                 from Effort_All 
                 group by Month, Year, Grid_ID, SurveyID')

lapply works pretty much the same way as dlply. lapply的工作方式与dlply几乎相同。 Just different arguments. 只是不同的论点。

Also, you could use colwise from plyr 另外,您可以使用colwiseplyr

 dlply(Effort_All, .(Month, Year, Grid_ID, SurveyID), colwise(sum, .(Er_Count, Mn_Count, Bp_Count)))

Or summarise_each from dplyr summarise_eachdplyr

library(dplyr)
Effort_All%>%
group_by(Month, Year, Grid_ID, SurveyID) %>% 
summarise_each(funs(sum), Er_Count, Mn_Count, Bp_Count)
#Source: local data frame [6 x 7]
#Groups: Month, Year, Grid_ID

#    Month Year Grid_ID    SurveyID Er_Count Mn_Count Bp_Count
#  1     2 2002     884   200203SSL        8        7        0
#  2     4 2001     126   200104SSL       26       12        6
#  3     4 2004     399   200404SSL       12       33        5
#  4     6 2001     125   200106SSL       75        9        3
#  5    10 2000     517   200009SSL       60       30        3
#  6    12 2009     340 200912GAP09       14        6        2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 试图理解R错误:FUN(X [[i]],…)中的错误:仅在具有所有数字变量的数据帧上定义 - Trying to understand R error: Error in FUN(X[[i]], …) : only defined on a data frame with all numeric variables 如何在交叉验证中解决“FUN(X[[i]], ...) 中的错误:仅在具有所有数字变量的数据框上定义” - How to solve "Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables" in cross validation R:FUN 中的错误(X[[i]],...):仅在具有所有数字变量的数据帧上定义 - R: Error in FUN(X[[i]], …) : only defined on a data frame with all numeric variables 如何解决“ FUN(X [[i]],…)中的错误:仅在具有所有数字变量的数据框中定义” - How to fix ‘Error in FUN(X[[i]], …) : only defined on a data frame with all numeric variables” FUN(X[[i]], ...) 中的错误:仅在具有所有数字变量的数据框上定义,以在素食主义者中运行 MDS - Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables in running a MDS in vegan FUN(X[[i]], ...) 中的错误:仅在具有所有数值变量的数据框中定义 - Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables 尝试对 num 列求和,并获得“仅在具有所有类似数字的变量的数据帧上定义”错误 - Trying to sum a num column, and getting "only defined on a data frame with all numeric-alike variables" error 在 R 中设置 package 导致 dplyr 函数出错(Error in FUN(X[[i]], ...: only defined on a data frame with all numeric variables) - Sets package in R causing dplyr functions to error (Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables) 如何在不丢失变量标签的情况下将大型数据框中的所有因子变量转换为数值变量? - How to convert all factor variables into numeric variables in a large data frame without loosing variables labels? 为什么 any() 只为数字而不是逻辑 data.frame 定义? - Why is any() only defined for a numeric and not logical data.frame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM