filter data frame based on multiple column values

Question

I have a problem.

Check out the dataframe below

Company Year  Status
A       2021  Unpaid
B       2021  Paid
C       2021  Unpaid
D       2021  Paid
A       2020  Unpaid
B       2020  Unpaid
C       2020  Paid
D       2020  Paid

I want to get a list of the companies that were unpaid in 2020 but paid in 2021 (so just C). I can do this in excel with no problem but can't figure it out in pandas. Am stumped.

Answer 1

You can pivot then use query

import pandas as pd


data = {
    "Company": ["A", "B", "C", "D", "A", "B", "C", "D"],
    "Year": [2021, 2021, 2021, 2021, 2020, 2020, 2020, 2020],
    "Status": ["Unpaid", "Paid", "Unpaid", "Paid", "Unpaid", "Unpaid", "Paid", "Paid"]
}

answer = (
    pd
    .DataFrame(data)
    .pivot_table(index="Company", columns="Status", values="Year")
    .reset_index()
    .query("Paid == 2020 & Unpaid == 2021")
    ["Company"].tolist()
)
print(answer)

filter data frame based on multiple column values

Question

1 answers

solution1
1 2022-08-26 02:43:36

filter data frame based on multiple column values

Question

1 answers

solution1 1 2022-08-26 02:43:36

solution1
1 2022-08-26 02:43:36