简体   繁体   中英

Pandas: Dataframe drop data based on a condition

have a df with values with columns name and subject

name  subject

mark   social  
mark   social
mark   maths
mark   social
mark   maths
mark   social
mark   social
mark   social
mark   social
mark   social
mark   maths
mark   social
mark   social
mark   social
mark   maths
mark   social
mark   social
mark   social
mark   social
mark   social
mark   social
mark   social
mark   math

if subject is in any order of social,social,maths. need to remove the first social. even if there are multiple social before a math. need to remove social which is in this order social,social,maths

name subject mark social mark social mark maths mark social mark maths mark social mark social mark social mark social mark social mark maths mark social mark social mark social mark maths mark social mark social mark social mark social mark social mark social mark social mark maths

In last row of your sample you have 'math'. I assume it should be 'maths' there. Then you could do:

df.loc[~(
    (df['subject'] == 'social')
    & (df['subject'].shift(-1) == 'social')
    & (df['subject'].shift(-2) == 'maths'))
]

We select rows we want to drop with df.loc and add negation symbol at the beginning - so each case when subject is 'social' and there is 'social' below and 'math' 2 positions below. In this case we drop rows 0, 8, 12 and 20.

Another solution:

df["tmp"] = df["subject"].str.contains("math") + (
    df["subject"].str.contains("social") * 2
)
df["tmp"] = (
    df.groupby("name")
    .rolling(3)["tmp"]
    .apply(lambda x: x.eq([2, 2, 1]).all())
    .values
)
df["tmp"] = df.groupby("name")["tmp"].transform(lambda x: x.shift(-2))
print(df[df["tmp"] != 1].drop(columns=["tmp"]))

Prints:

    name subject
1   mark  social
2   mark   maths
3   mark  social
4   mark   maths
5   mark  social
6   mark  social
7   mark  social
9   mark  social
10  mark   maths
11  mark  social
13  mark  social
14  mark   maths
15  mark  social
16  mark  social
17  mark  social
18  mark  social
19  mark  social
21  mark  social
22  mark    math

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM