在下一行符合条件后仅保留一行

Question

I'd like to know how I can keep only the rows when a subsequent row in the group meets a certain criteria. 我想知道当组中的下一行满足特定条件时如何仅保留行。 The following data illustrates what I am trying to achieve; 以下数据说明了我要实现的目标；

Data is sorted by ID ascending and DATE in descending order. 数据按ID升序和DATE降序排序。

The same ID only has one row or zero rows where Purchased = 'N' but can have zero, one, or more than one rows where Purchased = 'Y' . 相同的ID在Purchased = 'N'只有一行或零行，而在Purchased = 'Y'可以有零行，一行或多于一行。

I want to track the dates in which the EMPTY status changes; 我想跟踪EMPTY状态更改的日期；

ID      EMPTY   DATE
1        Y     03/01/2017
1        Y     02/01/2017
1        N     01/01/2017
2        Y     03/01/2017
3        N     03/01/2017
4        Y     03/01/2017
4        N     03/01/2017
4        Y     03/01/2017
4        Y     03/01/2017

Output: 输出：

I want to keep all the rows with EMPTY= 'N' : 我想保留EMPTY= 'N'所有行：

ID     EMPTY   DATE
1        Y     02/01/2017
1        N     01/01/2017
2        Y     01/01/2017
3        N     03/01/2017
4        Y     03/01/2017
4        N     03/01/2017

I can use either sql or python to do this; 我可以使用sql或python来执行此操作； so solutions for either or both languages are welcomed! 因此，欢迎使用其中一种或两种语言的解决方案！

Answer 1

In case you are actually interested in using R: 如果您实际上对使用R感兴趣：

library(dplyr)
df %>%
      mutate(lag.empty = lead(df$EMPTY,1)) %>%
      filter(lag.empty != EMPTY)  %>%
      select(-lag.empty)


#  ID EMPTY       DATE
#1  1     Y 02/01/2017
#2  1     N 01/01/2017
#3  2     Y 03/01/2017
#4  3     N 03/01/2017
#5  4     Y 03/01/2017
#6  4     N 03/01/2017

Data: 数据：

df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L), EMPTY = structure(c(2L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), 
DATE = structure(c(3L, 2L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("01/01/2017", 
"02/01/2017", "03/01/2017"), class = "factor")), .Names = c("ID", 
"EMPTY", "DATE"), class = "data.frame", row.names = c(NA, -9L))

Answer 2

One way with dplyr in R R dplyr一种方法

library(dplyr)
df1 %>% 
  group_by(ID) %>%  
  filter(n()==1 |(cumsum(cumsum(EMPTY == "N"))<2 & !duplicated(EMPTY)) )
# A tibble: 6 x 3
# Groups:   ID [4]
#     ID EMPTY       DATE
#  <int> <chr>      <chr>
#1     1     Y 03/01/2017
#2     1     N 01/01/2017
#3     2     Y 03/01/2017
#4     3     N 03/01/2017
#5     4     Y 03/01/2017
#6     4     N 03/01/2017

data 数据

df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L), EMPTY = c("Y", 
 "Y", "N", "Y", "N", "Y", "N", "Y", "Y"), DATE = c("03/01/2017", 
"02/01/2017", "01/01/2017", "03/01/2017", "03/01/2017", "03/01/2017", 
"03/01/2017", "03/01/2017", "03/01/2017")), .Names = c("ID", 
 "EMPTY", "DATE"), class = "data.frame", row.names = c(NA, -9L
 ))

Answer 3

In my experience this is a much prettier task in R, but since you are looking for a python solution: 以我的经验，这是R中更漂亮的任务，但是由于您正在寻找python解决方案：

dict = {'id':id,'empty':empty,'date':date}
df1 = pd.DataFrame(dict)

After loading into a pd dataframe by method of your choice: 通过您选择的方法加载到pd数据框中后：

lag = list(df1.loc[1:,'empty'])
lag.append('NULL')                    ##to make list match frame rowcount
df1['empty_+1'] = lag
df1['check'] = df1['empty'] != df1['empty_+1']
df1.loc[(df1['check'] == True)]

Answer 4

In mysql, one approach is to 在mysql中，一种方法是

1) add automatic incremental row-id to the table 1）在表中添加自动增量row-id

 ALTER TABLE table1 ADD row_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY;

2) left join the same table with one-row shifting 2）左移同一行加入同一表

3) add selection conditions: (i) current row has 'N' empty, (ii) current row has 'Y' empty but the next row has 'N' empty 3）添加选择条件：（i）当前行为“ N”为空，（ii）当前行为“ Y”为空，而下一行为“ N”为空

SELECT a.ID, a.Empty, a.Day 
FROM table1 a 
LEFT JOIN table1 b ON a.row_id + 1 = b.row_id
WHERE a.Empty = 'N' or (a.Empty = 'Y' and b.Empty = 'N')

RESULT 结果

ID  Empty   Day
1   Y   02/01/2017
1   N   01/01/2017
2   Y   03/01/2017
3   N   03/01/2017
4   Y   03/01/2017
4   N   03/01/2017

DATA 数据

CREATE TABLE table1 (ID int, EMPTY varchar(255), DAY varchar(255));
INSERT table1 VALUES (1,'Y','03/01/2017'),(1,'Y','02/01/2017'),(1,'N','01/01/2017'),(2,'Y','03/01/2017'),(3,'N','03/01/2017'),(4,'Y','03/01/2017'),(4,'N','03/01/2017'),(4,'Y','03/01/2017'),(4,'Y','03/01/2017');

在下一行符合条件后仅保留一行

问题描述

4 个解决方案

解决方案1
2 2017-07-11 18:41:40

解决方案2
1 2017-07-11 18:41:28

data 数据

解决方案3
1 2017-07-12 01:12:54

解决方案4
0 2018-03-08 16:22:22

在下一行符合条件后仅保留一行

问题描述

4 个解决方案

解决方案1 2 2017-07-11 18:41:40

解决方案2 1 2017-07-11 18:41:28

data 数据

解决方案3 1 2017-07-12 01:12:54

解决方案4 0 2018-03-08 16:22:22

解决方案1
2 2017-07-11 18:41:40

解决方案2
1 2017-07-11 18:41:28

解决方案3
1 2017-07-12 01:12:54

解决方案4
0 2018-03-08 16:22:22