简体   繁体   中英

How do I remove rows based on multiple conditions in Python / Pandas dataframe?

I have a table which looks something like this:

Identified Software Version Date
0 Microsoft Office 2 2022-05-25
0 Microsoft Office 1 2022-03-21
0 Adobe Photoshop 2 2022-04-20
1 Adobe Photoshop 1 2021-04-04

The 'Identified' column is a column I have created using this code:

import pandas as pd
import datetime as dt

dfcheck = pd.read_csv('version-data.csv', encoding='utf8')
df = pd.DataFrame(dfcheck)

olderdata = dt.date.today() - pd.DateOffset(years=1)

df['Identified'] = (df['Date'] <= olderdata).astype(int)

In this I have marked everything older than one year. So now what I'm trying to do is create a new dataframe which shows all software packages which have been identified. Here is the output I am looking for:

Identified Software Version Date
0 Adobe Photoshop 2 2022-04-20
1 Adobe Photoshop 1 2021-04-04

How do I achieve this?

You can use groupby.filter :

out = df.groupby('Software').filter(lambda x: (x.Identified==1).any())

print (out)

   Identified          Software   Version        Date
2           0   Adobe Photoshop         2  2022-04-20
3           1   Adobe Photoshop         1  2021-04-04

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM