简体   繁体   中英

filter data frame based on multiple column values

I have a problem.

Check out the dataframe below

Company Year  Status
A       2021  Unpaid
B       2021  Paid
C       2021  Unpaid
D       2021  Paid
A       2020  Unpaid
B       2020  Unpaid
C       2020  Paid
D       2020  Paid

I want to get a list of the companies that were unpaid in 2020 but paid in 2021 (so just C). I can do this in excel with no problem but can't figure it out in pandas. Am stumped.

You can pivot then use query

import pandas as pd


data = {
    "Company": ["A", "B", "C", "D", "A", "B", "C", "D"],
    "Year": [2021, 2021, 2021, 2021, 2020, 2020, 2020, 2020],
    "Status": ["Unpaid", "Paid", "Unpaid", "Paid", "Unpaid", "Unpaid", "Paid", "Paid"]
}

answer = (
    pd
    .DataFrame(data)
    .pivot_table(index="Company", columns="Status", values="Year")
    .reset_index()
    .query("Paid == 2020 & Unpaid == 2021")
    ["Company"].tolist()
)
print(answer)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM