簡體   English   中英

將數據幀的互補行與R合並

[英]Merging complementary rows of a dataframe with R

我有這樣一個數據框架

0     weekday day month year hour basal bolus carb period.h
1    Tuesday  01    03 2016  0.0  0.25    NA   NA        0
2    Tuesday  01    03 2016 10.9    NA    NA   67       10
3    Tuesday  01    03 2016 10.9    NA  4.15   NA       10
4    Tuesday  01    03 2016 12.0  0.30    NA   NA       12
5    Tuesday  01    03 2016 17.0  0.50    NA   NA       17
6    Tuesday  01    03 2016 17.6    NA    NA   33       17
7    Tuesday  01    03 2016 17.6    NA  1.35   NA       17
8    Tuesday  01    03 2016 18.6    NA    NA   44       18
9    Tuesday  01    03 2016 18.6    NA  1.80   NA       18
10   Tuesday  01    03 2016 18.9    NA    NA   17       18
11   Tuesday  01    03 2016 18.9    NA  0.70   NA       18
12   Tuesday  01    03 2016 22.0  0.40    NA   NA       22
13 Wednesday  02    03 2016  0.0  0.25    NA   NA        0
14 Wednesday  02    03 2016  9.7    NA    NA   39        9
15 Wednesday  02    03 2016  9.7    NA  2.65   NA        9
16 Wednesday  02    03 2016 11.2    NA    NA   13       11
17 Wednesday  02    03 2016 11.2    NA  0.30   NA       11
18 Wednesday  02    03 2016 12.0  0.30    NA   NA       12
19 Wednesday  02    03 2016 12.0    NA    NA   16       12
20 Wednesday  02    03 2016 12.0    NA  0.65   NA       12

如果你看第2行和第3行,你會注意到它們完全對應於同一天和時間:僅對於第2行,“carb”不是NA,而“bolus”不是NA(這些是關於糖尿病反)。

我想將這些行合並為一個:

2    Tuesday  01    03 2016 10.9    NA    NA   67       10
3    Tuesday  01    03 2016 10.9    NA  4.15   NA       10

- >

2    Tuesday  01    03 2016 10.9    NA    4.15   67       10

我當然可以在每一行上做一個殘酷的雙循環,但我尋找一種更聰明,更快捷的方式。

您可以在此處按公共標識符列weekday, day, month, year, hour, period.h對數據框進行weekday, day, month, year, hour, period.h ,然后對要合並的其余列中的第一個元素進行sort() ,默認情況下sort()函數將刪除要排序的向量中的NA ,因此每個組中的每個列最終都會得到非NA元素; 如果列中的所有元素都是NA ,則sort(col)[1]返回NA:

library(dplyr)
df %>% 
       group_by(weekday, day, month, year, hour, period.h) %>% 
       summarise_all(funs(sort(.)[1]))

#      weekday   day month  year  hour period.h basal bolus  carb
#       <fctr> <int> <int> <int> <dbl>    <int> <dbl> <dbl> <int>
# 1    Tuesday     1     3  2016   0.0        0  0.25    NA    NA
# 2    Tuesday     1     3  2016  10.9       10    NA  4.15    67
# 3    Tuesday     1     3  2016  12.0       12  0.30    NA    NA
# 4    Tuesday     1     3  2016  17.0       17  0.50    NA    NA
# 5    Tuesday     1     3  2016  17.6       17    NA  1.35    33
# 6    Tuesday     1     3  2016  18.6       18    NA  1.80    44
# 7    Tuesday     1     3  2016  18.9       18    NA  0.70    17
# 8    Tuesday     1     3  2016  22.0       22  0.40    NA    NA
# 9  Wednesday     2     3  2016   0.0        0  0.25    NA    NA
# 10 Wednesday     2     3  2016   9.7        9    NA  2.65    39
# 11 Wednesday     2     3  2016  11.2       11    NA  0.30    13
# 12 Wednesday     2     3  2016  12.0       12  0.30  0.65    16

而不是sort() ,這里使用的更合適的函數是na.omit()

df %>% group_by(weekday, day, month, year, hour, period.h) %>% 
       summarise_all(funs(na.omit(.)[1]))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM