How to filter a time series pandas dataframe based on two columns?

Question

So I have a time series data frame which has both Patient Ids in one column and Years ( 1 - 3) in another columns. What I want to do using pandas is, for each unique patient Id only keep rows that have entries for all the Years 1-3.

IN

OUT

Answer 1

It can be done in one line like so:

df = pd.DataFrame({
    "ID": [111, 111, 111, 222, 222, 333, 333, 333],
    "Year": [1, 2, 3, 1, 2, 1, 2, 3]})

# filter unique IDs that doesn't contain [1, 2, 3]
df = df.groupby('ID').filter(lambda g: (sorted(set(g["Year"].values)) == [1,2,3]))
print(df)
#    ID  Year
#0  111     1
#1  111     2
#2  111     3
#5  333     1
#6  333     2
#7  333     3

Answer 2

Assuming that Year can be only 1 thru 3 , you can run:

result = df.groupby('ID').filter(lambda grp: grp.Year.unique().size == 3)

unique() is needed to filter out cases when there are eg 3 rows for particular ID , but all from the same Year .

How to filter a time series pandas dataframe based on two columns?

Question

2 answers

solution1
2 ACCPTED 2020-06-04 17:30:37

solution2
0 2020-06-04 17:41:07

How to filter a time series pandas dataframe based on two columns?

Question

2 answers

solution1 2 ACCPTED 2020-06-04 17:30:37

solution2 0 2020-06-04 17:41:07

solution1
2 ACCPTED 2020-06-04 17:30:37

solution2
0 2020-06-04 17:41:07