[英]How to group data by column and apply a self-defined function to each small group
我有一個葯物數據集,其中包含有關每個患者及其所用葯物的信息:
Record.ID Label.Name Generic.Medication.Name Strength Quantity Days.Supplied Date.of.Fill GCN GC3 NDC category
4 aaaaa CRESTOR TAB 20 MG ROSUVASTATIN CALCIUM TAB 20 MG 20 MG 30000 30 2014-06-18 19154 M4D 00310075290 statins
5 aaaaa CRESTOR TAB 20 MG ROSUVASTATIN CALCIUM TAB 20 MG 20 MG 90000 90 2014-06-25 19154 M4D 00310075290 statins
6 aaaaa CRESTOR TAB 20 MG ROSUVASTATIN CALCIUM TAB 20 MG 20 MG 90000 90 2014-09-30 19154 M4D 00310075290 statins
7 aaaaa CRESTOR TAB 20 MG ROSUVASTATIN CALCIUM TAB 20 MG 20 MG 90000 90 2014-12-18 19154 M4D 00310075290 statins
8 aaaaa CRESTOR TAB 10 MG ROSUVASTATIN CALCIUM TAB 10 MG 10 MG 90000 90 2014-12-18 19153 M4D 00310075190 statins
60 bbbbb TELMISARTAN TAB 20 MG TELMISARTAN TAB 20 MG 20 MG 90000 90 2014-01-24 23833 A4F 00054054218 RASA
61 bbbbb TELMISARTAN TAB 20 MG TELMISARTAN TAB 20 MG 20 MG 90000 90 2014-04-03 23833 A4F 00054054218 RASA
62 bbbbb TELMISARTAN TAB 20 MG TELMISARTAN TAB 20 MG 20 MG 90000 90 2014-07-21 23833 A4F 00054054218 RASA
63 bbbbb TELMISARTAN TAB 20 MG TELMISARTAN TAB 20 MG 20 MG 90000 90 2014-10-22 23833 A4F 00054054218 RASA
66 ccccc ENALAPRIL MALEATE TAB 2.5 MG ENALAPRIL MALEATE TAB 2.5 MG 2.5 MG 15000 30 2014-01-06 963 A4D 00378105101 RASA
我有一個名為calc_adherence <-function(fill,year)的函數,該函數輸入數據集並返回一行:例如,
fill <-
Record.ID Label.Name Generic.Medication.Name Strength Quantity Days.Supplied Date.of.Fill GCN GC3 NDC category
4 aaaaa CRESTOR TAB 20 MG ROSUVASTATIN CALCIUM TAB 20 MG 20 MG 30000 30 2014-06-18 19154 M4D 00310075290 statins
5 aaaaa CRESTOR TAB 20 MG ROSUVASTATIN CALCIUM TAB 20 MG 20 MG 90000 90 2014-06-25 19154 M4D 00310075290 statins
6 aaaaa CRESTOR TAB 20 MG ROSUVASTATIN CALCIUM TAB 20 MG 20 MG 90000 90 2014-09-30 19154 M4D 00310075290 statins
7 aaaaa CRESTOR TAB 20 MG ROSUVASTATIN CALCIUM TAB 20 MG 20 MG 90000 90 2014-12-18 19154 M4D 00310075290 statins
8 aaaaa CRESTOR TAB 10 MG ROSUVASTATIN CALCIUM TAB 10 MG 10 MG 90000 90 2014-12-18 19153 M4D 00310075190 statins
函數返回我
Record.ID Label.Name Generic.Medication.Name Strength Category First_fill Last_fill Duration DaysCovered Year Method Adherence
aaaaa CRESTOR TAB 20 MG ROSUVASTATIN CALCIUM TAB 20 MG 20 MG statins 2014-06-18 2014-12-18 197 days 197 2014 PDC 1
我的問題是,現在我想先按Record.ID和類別對葯物數據集進行分組,然后將calc_adherence應用於每個小組,這樣我就可以獲得每個患者及其所用葯物的結果。
ddply(category.medication, c('Record.ID','category'), summarize, function(x) calc_adherence(x, year)),
但這不起作用..我想要的最終數據集是
Record.ID Label.Name Generic.Medication.Name Strength Category First_fill Last_fill Duration DaysCovered Year Method Adherence
aaaaa CRESTOR TAB 20 MG ROSUVASTATIN CALCIUM TAB 20 MG 20 MG statins 2014-06-18 2014-12-18 197 days 197 2014 PDC 1
aaaaa ... RASA ... PDC 0.8
bbbbb ... RASA ... PDC 0.75
嘗試
aggreagate
例如,如果您的數據名為“數據”
aggregate(data, by = list(data$Record.ID), calc_adherence)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.