繁体   English   中英

在下一行符合条件后仅保留一行

[英]keep only row right after subsequent row meets criteria

我想知道当组中的下一行满足特定条件时如何仅保留行。 以下数据说明了我要实现的目标;

数据按ID升序和DATE降序排序。

相同的ID在Purchased = 'N'只有一行或零行,而在Purchased = 'Y'可以有零行,一行或多于一行。

我想跟踪EMPTY状态更改的日期;

ID      EMPTY   DATE
1        Y     03/01/2017
1        Y     02/01/2017
1        N     01/01/2017
2        Y     03/01/2017
3        N     03/01/2017
4        Y     03/01/2017
4        N     03/01/2017
4        Y     03/01/2017
4        Y     03/01/2017

输出:

我想保留EMPTY= 'N'所有行:

ID     EMPTY   DATE
1        Y     02/01/2017
1        N     01/01/2017
2        Y     01/01/2017
3        N     03/01/2017
4        Y     03/01/2017
4        N     03/01/2017

我可以使用sqlpython来执行此操作; 因此,欢迎使用其中一种或两种语言的解决方案!

如果您实际上对使用R感兴趣:

library(dplyr)
df %>%
      mutate(lag.empty = lead(df$EMPTY,1)) %>%
      filter(lag.empty != EMPTY)  %>%
      select(-lag.empty)


#  ID EMPTY       DATE
#1  1     Y 02/01/2017
#2  1     N 01/01/2017
#3  2     Y 03/01/2017
#4  3     N 03/01/2017
#5  4     Y 03/01/2017
#6  4     N 03/01/2017

数据:

df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L), EMPTY = structure(c(2L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), 
DATE = structure(c(3L, 2L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("01/01/2017", 
"02/01/2017", "03/01/2017"), class = "factor")), .Names = c("ID", 
"EMPTY", "DATE"), class = "data.frame", row.names = c(NA, -9L))

R dplyr一种方法

library(dplyr)
df1 %>% 
  group_by(ID) %>%  
  filter(n()==1 |(cumsum(cumsum(EMPTY == "N"))<2 & !duplicated(EMPTY)) )
# A tibble: 6 x 3
# Groups:   ID [4]
#     ID EMPTY       DATE
#  <int> <chr>      <chr>
#1     1     Y 03/01/2017
#2     1     N 01/01/2017
#3     2     Y 03/01/2017
#4     3     N 03/01/2017
#5     4     Y 03/01/2017
#6     4     N 03/01/2017

数据

df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L), EMPTY = c("Y", 
 "Y", "N", "Y", "N", "Y", "N", "Y", "Y"), DATE = c("03/01/2017", 
"02/01/2017", "01/01/2017", "03/01/2017", "03/01/2017", "03/01/2017", 
"03/01/2017", "03/01/2017", "03/01/2017")), .Names = c("ID", 
 "EMPTY", "DATE"), class = "data.frame", row.names = c(NA, -9L
 ))

以我的经验,这是R中更漂亮的任务,但是由于您正在寻找python解决方案:

dict = {'id':id,'empty':empty,'date':date}
df1 = pd.DataFrame(dict)

通过您选择的方法加载到pd数据框中后:

lag = list(df1.loc[1:,'empty'])
lag.append('NULL')                    ##to make list match frame rowcount
df1['empty_+1'] = lag
df1['check'] = df1['empty'] != df1['empty_+1']
df1.loc[(df1['check'] == True)]

在mysql中,一种方法是

1)在表中添加自动增量row-id

 ALTER TABLE table1 ADD row_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY;

2)左移同一行加入同一表

3)添加选择条件:(i)当前行为“ N”为空,(ii)当前行为“ Y”为空,而下一行为“ N”为空

SELECT a.ID, a.Empty, a.Day 
FROM table1 a 
LEFT JOIN table1 b ON a.row_id + 1 = b.row_id
WHERE a.Empty = 'N' or (a.Empty = 'Y' and b.Empty = 'N')

结果

ID  Empty   Day
1   Y   02/01/2017
1   N   01/01/2017
2   Y   03/01/2017
3   N   03/01/2017
4   Y   03/01/2017
4   N   03/01/2017

数据

CREATE TABLE table1 (ID int, EMPTY varchar(255), DAY varchar(255));
INSERT table1 VALUES (1,'Y','03/01/2017'),(1,'Y','02/01/2017'),(1,'N','01/01/2017'),(2,'Y','03/01/2017'),(3,'N','03/01/2017'),(4,'Y','03/01/2017'),(4,'N','03/01/2017'),(4,'Y','03/01/2017'),(4,'Y','03/01/2017');

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM