繁体   English   中英

将数据帧的互补行与R合并

[英]Merging complementary rows of a dataframe with R

我有这样一个数据框架

0     weekday day month year hour basal bolus carb period.h
1    Tuesday  01    03 2016  0.0  0.25    NA   NA        0
2    Tuesday  01    03 2016 10.9    NA    NA   67       10
3    Tuesday  01    03 2016 10.9    NA  4.15   NA       10
4    Tuesday  01    03 2016 12.0  0.30    NA   NA       12
5    Tuesday  01    03 2016 17.0  0.50    NA   NA       17
6    Tuesday  01    03 2016 17.6    NA    NA   33       17
7    Tuesday  01    03 2016 17.6    NA  1.35   NA       17
8    Tuesday  01    03 2016 18.6    NA    NA   44       18
9    Tuesday  01    03 2016 18.6    NA  1.80   NA       18
10   Tuesday  01    03 2016 18.9    NA    NA   17       18
11   Tuesday  01    03 2016 18.9    NA  0.70   NA       18
12   Tuesday  01    03 2016 22.0  0.40    NA   NA       22
13 Wednesday  02    03 2016  0.0  0.25    NA   NA        0
14 Wednesday  02    03 2016  9.7    NA    NA   39        9
15 Wednesday  02    03 2016  9.7    NA  2.65   NA        9
16 Wednesday  02    03 2016 11.2    NA    NA   13       11
17 Wednesday  02    03 2016 11.2    NA  0.30   NA       11
18 Wednesday  02    03 2016 12.0  0.30    NA   NA       12
19 Wednesday  02    03 2016 12.0    NA    NA   16       12
20 Wednesday  02    03 2016 12.0    NA  0.65   NA       12

如果你看第2行和第3行,你会注意到它们完全对应于同一天和时间:仅对于第2行,“carb”不是NA,而“bolus”不是NA(这些是关于糖尿病反)。

我想将这些行合并为一个:

2    Tuesday  01    03 2016 10.9    NA    NA   67       10
3    Tuesday  01    03 2016 10.9    NA  4.15   NA       10

- >

2    Tuesday  01    03 2016 10.9    NA    4.15   67       10

我当然可以在每一行上做一个残酷的双循环,但我寻找一种更聪明,更快捷的方式。

您可以在此处按公共标识符列weekday, day, month, year, hour, period.h对数据框进行weekday, day, month, year, hour, period.h ,然后对要合并的其余列中的第一个元素进行sort() ,默认情况下sort()函数将删除要排序的向量中的NA ,因此每个组中的每个列最终都会得到非NA元素; 如果列中的所有元素都是NA ,则sort(col)[1]返回NA:

library(dplyr)
df %>% 
       group_by(weekday, day, month, year, hour, period.h) %>% 
       summarise_all(funs(sort(.)[1]))

#      weekday   day month  year  hour period.h basal bolus  carb
#       <fctr> <int> <int> <int> <dbl>    <int> <dbl> <dbl> <int>
# 1    Tuesday     1     3  2016   0.0        0  0.25    NA    NA
# 2    Tuesday     1     3  2016  10.9       10    NA  4.15    67
# 3    Tuesday     1     3  2016  12.0       12  0.30    NA    NA
# 4    Tuesday     1     3  2016  17.0       17  0.50    NA    NA
# 5    Tuesday     1     3  2016  17.6       17    NA  1.35    33
# 6    Tuesday     1     3  2016  18.6       18    NA  1.80    44
# 7    Tuesday     1     3  2016  18.9       18    NA  0.70    17
# 8    Tuesday     1     3  2016  22.0       22  0.40    NA    NA
# 9  Wednesday     2     3  2016   0.0        0  0.25    NA    NA
# 10 Wednesday     2     3  2016   9.7        9    NA  2.65    39
# 11 Wednesday     2     3  2016  11.2       11    NA  0.30    13
# 12 Wednesday     2     3  2016  12.0       12  0.30  0.65    16

而不是sort() ,这里使用的更合适的函数是na.omit()

df %>% group_by(weekday, day, month, year, hour, period.h) %>% 
       summarise_all(funs(na.omit(.)[1]))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM