简体   繁体   中英

Python: How to filter out rows based on a condition from 2 columns

I have a data frame DF in Python and I want to filter its rows based on 2 columns.

In particular, I want to remove the rows where orderdate is earlier than the startdate

How can I reverse/opposite the condition inside the following code to achieve what I want?

DF = DF.loc[DF['orderdate']<DF['startdate']]

I could reframe the code like below but it won't cover some rows that have NaT and I want to keep them

DF = DF.loc[DF['orderdate']>=DF['startdate']]

Inserting the ~ in front of the condition in parenthesis will reverse the condition and remove all the rows that do not satisfy it.

DF = DF.loc[~(DF['orderdate']<DF['startdate'])]

1- loc takes the rows from the 'orderdate' column and compares them with the rows from the 'startdate' column. Where the condition is true, it returns the index of the lines and stores it in the ids array.

2 - The drop method deletes lines in the dataframe, the parameters are the array with the indices of the lines, and inplace = True, this ensures that the operation is performed on the dataframe itself, if it is False operation it will return a copy of the dataframe

# Get names of indexes for which column orderdate > =  startdate
ids = DF.loc[DF['orderdate'] >= DF['startdate']].index
# Delete these row indexes from dataFrame
DF.drop(ids, inplace=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM