简体   繁体   English

在下一行符合条件后仅保留一行

[英]keep only row right after subsequent row meets criteria

I'd like to know how I can keep only the rows when a subsequent row in the group meets a certain criteria. 我想知道当组中的下一行满足特定条件时如何仅保留行。 The following data illustrates what I am trying to achieve; 以下数据说明了我要实现的目标;

Data is sorted by ID ascending and DATE in descending order. 数据按ID升序和DATE降序排序。

The same ID only has one row or zero rows where Purchased = 'N' but can have zero, one, or more than one rows where Purchased = 'Y' . 相同的ID在Purchased = 'N'只有一行或零行,而在Purchased = 'Y'可以有零行,一行或多于一行。

I want to track the dates in which the EMPTY status changes; 我想跟踪EMPTY状态更改的日期;

ID      EMPTY   DATE
1        Y     03/01/2017
1        Y     02/01/2017
1        N     01/01/2017
2        Y     03/01/2017
3        N     03/01/2017
4        Y     03/01/2017
4        N     03/01/2017
4        Y     03/01/2017
4        Y     03/01/2017

Output: 输出:

I want to keep all the rows with EMPTY= 'N' : 我想保留EMPTY= 'N'所有行:

ID     EMPTY   DATE
1        Y     02/01/2017
1        N     01/01/2017
2        Y     01/01/2017
3        N     03/01/2017
4        Y     03/01/2017
4        N     03/01/2017

I can use either sql or python to do this; 我可以使用sqlpython来执行此操作; so solutions for either or both languages are welcomed! 因此,欢迎使用其中一种或两种语言的解决方案!

In case you are actually interested in using R: 如果您实际上对使用R感兴趣:

library(dplyr)
df %>%
      mutate(lag.empty = lead(df$EMPTY,1)) %>%
      filter(lag.empty != EMPTY)  %>%
      select(-lag.empty)


#  ID EMPTY       DATE
#1  1     Y 02/01/2017
#2  1     N 01/01/2017
#3  2     Y 03/01/2017
#4  3     N 03/01/2017
#5  4     Y 03/01/2017
#6  4     N 03/01/2017

Data: 数据:

df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L), EMPTY = structure(c(2L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), 
DATE = structure(c(3L, 2L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("01/01/2017", 
"02/01/2017", "03/01/2017"), class = "factor")), .Names = c("ID", 
"EMPTY", "DATE"), class = "data.frame", row.names = c(NA, -9L))

One way with dplyr in R R dplyr一种方法

library(dplyr)
df1 %>% 
  group_by(ID) %>%  
  filter(n()==1 |(cumsum(cumsum(EMPTY == "N"))<2 & !duplicated(EMPTY)) )
# A tibble: 6 x 3
# Groups:   ID [4]
#     ID EMPTY       DATE
#  <int> <chr>      <chr>
#1     1     Y 03/01/2017
#2     1     N 01/01/2017
#3     2     Y 03/01/2017
#4     3     N 03/01/2017
#5     4     Y 03/01/2017
#6     4     N 03/01/2017

data 数据

df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L), EMPTY = c("Y", 
 "Y", "N", "Y", "N", "Y", "N", "Y", "Y"), DATE = c("03/01/2017", 
"02/01/2017", "01/01/2017", "03/01/2017", "03/01/2017", "03/01/2017", 
"03/01/2017", "03/01/2017", "03/01/2017")), .Names = c("ID", 
 "EMPTY", "DATE"), class = "data.frame", row.names = c(NA, -9L
 ))

In my experience this is a much prettier task in R, but since you are looking for a python solution: 以我的经验,这是R中更漂亮的任务,但是由于您正在寻找python解决方案:

dict = {'id':id,'empty':empty,'date':date}
df1 = pd.DataFrame(dict)

After loading into a pd dataframe by method of your choice: 通过您选择的方法加载到pd数据框中后:

lag = list(df1.loc[1:,'empty'])
lag.append('NULL')                    ##to make list match frame rowcount
df1['empty_+1'] = lag
df1['check'] = df1['empty'] != df1['empty_+1']
df1.loc[(df1['check'] == True)]

In mysql, one approach is to 在mysql中,一种方法是

1) add automatic incremental row-id to the table 1)在表中添加自动增量row-id

 ALTER TABLE table1 ADD row_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY;

2) left join the same table with one-row shifting 2)左移同一行加入同一表

3) add selection conditions: (i) current row has 'N' empty, (ii) current row has 'Y' empty but the next row has 'N' empty 3)添加选择条件:(i)当前行为“ N”为空,(ii)当前行为“ Y”为空,而下一行为“ N”为空

SELECT a.ID, a.Empty, a.Day 
FROM table1 a 
LEFT JOIN table1 b ON a.row_id + 1 = b.row_id
WHERE a.Empty = 'N' or (a.Empty = 'Y' and b.Empty = 'N')

RESULT 结果

ID  Empty   Day
1   Y   02/01/2017
1   N   01/01/2017
2   Y   03/01/2017
3   N   03/01/2017
4   Y   03/01/2017
4   N   03/01/2017

DATA 数据

CREATE TABLE table1 (ID int, EMPTY varchar(255), DAY varchar(255));
INSERT table1 VALUES (1,'Y','03/01/2017'),(1,'Y','02/01/2017'),(1,'N','01/01/2017'),(2,'Y','03/01/2017'),(3,'N','03/01/2017'),(4,'Y','03/01/2017'),(4,'N','03/01/2017'),(4,'Y','03/01/2017'),(4,'Y','03/01/2017');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 标记n天之内行是否符合条件 - Flag if row meets criteria within n days 如何遍历行中的列以查找满足某些条件的第一个列 - How to iterate through columns in a row to find the first that meets some criteria 如何查找 Pandas 中每一行的哪一列首先满足条件? - How to find which column meets a criteria first for each row in Pandas? Python Pandas - 查找满足每行条件的值 - Python Pandas - Find value that meets criteria for each row 当行元素符合条件时查找列名 - Find column names when row element meets a criteria Pandas Python Pandas:如果groupby中任何前面的行中的值满足特定条件,则从数据框中删除一行 - Python Pandas: Eliminate a row from a dataframe if a value in a any preceding row in a groupby meets a certain criteria 熊猫行操作,每行仅保留最正确的非零值 - pandas row operation to keep only the right most non zero value per row Python:仅在行的一部分满足条件时才打印行 - Python:Printing rows only if a part of the row meets conditions 从 SQLite 数据库中选择一些唯一的行,其中每一行都满足单独的条件 - Selecting some unique rows from an SQLite database, where each row meets a separate criteria 对于数据框中的每个组,请删除符合某些条件的行之后的行 - For each group in a dataframe, drop rows that postdate a row which meets some criteria
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM