简体   繁体   中英

Subset df on value and subsequent row - pandas

I know this is in S0 somewhere but I can't seem to find it. I want to subset a df on a specific value and include the following unique rows. Using below, I can return values equal to A , but I'm hoping to return the next unique values, which is B .

Note: The subsequent unique value may not be B or may have varying rows, so I need a function that finds the returns all subsequent unique values.

import pandas as pd

df = pd.DataFrame({   
    'Time' : [1,1,1,1,1,1,2,2,2,2,2,2],             
    'ID' : ['A','A','B','B','C','C','A','A','B','B','C','C'],      
    'Val' : [2.0,5.0,2.5,2.0,2.0,1.0,1.0,6.0,4.0,2.0,5.0,1.0],   
    })

df = df[df['ID'] == 'A']

intended output:

    Time ID  Val
0      1  A  2.0
1      1  A  5.0
2      1  B  2.5
3      1  B  2.0
4      2  A  1.0
5      2  A  6.0
6      2  B  4.0
7      2  B  2.0

Ok OP let me do this again, you want to find all the rows which are "A" (base condition) and all the rows which are following a "A" row at some point, right?

Then,

is_A = df["ID"] == "A"
not_A_follows_from_A = (df["ID"] != "A") &( df["ID"].shift() == "A")
candidates = df["ID"].loc[is_A | not_A_follows_from_A].unique()
df.loc[df["ID"].isin(candidates)]

Should work as intented.

Edit: example

df = pd.DataFrame({
 'Time': [1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1],
 'ID': ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'A', 'E', 'E', 'E', 'A', 'F'],
 'Val': [7, 2, 7, 5, 1, 6, 7, 3, 2, 4, 7, 8, 2]})
is_A = df["ID"] == "A"
not_A_follows_from_A = (df["ID"] != "A") &( df["ID"].shift() == "A")
candidates = df["ID"].loc[is_A | not_A_follows_from_A].unique()
df.loc[df["ID"].isin(candidates)]

outputs this:

    Time ID  Val
0      1  A    7
1      1  A    2
2      1  B    7
3      0  B    5
7      1  A    3
8      0  E    2
9      0  E    4
10     1  E    7
11     1  A    8
12     1  F    2

Let us try drop_duplicates , then groupby select the number of unique ID we would like to keep by head , and merge

out = df.merge(df[['Time','ID']].drop_duplicates().groupby('Time').head(2))
   Time ID  Val
0     1  A  2.0
1     1  A  5.0
2     1  B  2.5
3     1  B  2.0
4     2  A  1.0
5     2  A  6.0
6     2  B  4.0
7     2  B  2.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM