简体   繁体   English

合并R中的行,同时排除某些数据

[英]Merging rows in R while excluding certain data

Let's say I have a data frame with many subjects and many test variables: 假设我有一个包含许多主题和许多测试变量的数据框:

   Name      Date1      Date2 `Test1` `Test2` `Test3`
  <dbl>     <dttm>     <dttm>   <chr>   <chr>   <chr>
1 Steve 2012-02-27 2011-11-18    <NA>    <NA>      3
2 Steve 2012-02-27 2012-01-22      4     <NA>    <NA>
3 Steve 2012-02-27 2014-08-09    <NA>      8     <NA>
4 Mike  2012-02-09 2007-03-29      1       2       3
5 Mike  2012-02-09 2009-07-13    <NA>      5       6
6 Mike  2012-02-09 2014-03-11    <NA>    <NA>      9
7 John  2012-03-20 2013-10-22      1       2     <NA>
8 John  2012-03-20 2014-03-17      4       5     <NA>
9 John  2012-03-20 2015-06-01    <NA>      8       9

I would like to know (most likely with dplyr) how to exclude data of rows that have a Date2 that is past Date1. 我想知道(最有可能使用dplyr)如何排除Date2过去了Date1的行的数据。 Then to combine the remaining data into one row by (arranged by Name) while excluding the earlier data that have more recent results. 然后,将剩余数据合并为一行(按名称排列),同时排除具有较新结果的较早数据。 Then write a new data frame that excludes the Date2 column, all while still including the "NA"s in the data. 然后编写一个新的数据框,该数据框将Date2列排除在外,同时仍在数据中包含“ NA”。
Also, if none of the Date2 column are before the Date1 column, I would like to keep the Name but include a row of "NA"s (as in the case of "John"). 另外,如果Date2列之前都不是Date2列,那么我想保留Name,但要包含一行“ NA”(就像“ John”一样)。

So the results should look like this: 因此结果应如下所示:

   Name      Date1 `Test1` `Test2` `Test3`
  <dbl>     <dttm>   <chr>   <chr>   <chr>
1 Steve 2012-02-27      4     <NA>      3
2 Mike  2012-02-09      1       5       6
3 John  2012-03-20    <NA>    <NA>    <NA>

Any help on this would be greatly appreciated, thank you. 对此,我们将不胜感激,谢谢。

This will do it with dplyr ... 这将与dplyr一起dplyr ...

library(dplyr)
df2 <- df %>% filter(as.Date(Date2) <= as.Date(Date1)) %>% #remove date2 past date1
  arrange(as.Date(Date2)) %>% #make sure ordered by date2
  group_by(Name, Date1) %>% #group by name and date1
  summarise_all(function(x) last(x[!is.na(x)])) %>% #summarise remaining (i.e. the test-columns) by the last non-NA value
  right_join(df %>% distinct(Name, Date1)) %>% #join names and date1 from original df (to restore NA rows such as John)
  select(-Date2) #remove Date2

df2

   Name      Date1 Test1 Test2 Test3
1 Steve 2012-02-27     4  <NA>     3
2  Mike 2012-02-09     1     5     6
3  John 2012-03-20  <NA>  <NA>  <NA>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM