[英]Python: How to filter out rows based on a condition from 2 columns
I have a data frame DF
in Python and I want to filter its rows based on 2 columns.我在 Python 中有一个数据框
DF
,我想根据 2 列过滤它的行。
In particular, I want to remove the rows where orderdate
is earlier than the startdate
特别是,我想删除
orderdate
早于startdate
的行
How can I reverse/opposite the condition inside the following code to achieve what I want?如何反转/反对以下代码中的条件以实现我想要的?
DF = DF.loc[DF['orderdate']<DF['startdate']]
I could reframe the code like below but it won't cover some rows that have NaT and I want to keep them我可以像下面那样重新构建代码,但它不会覆盖一些有 NaT 的行,我想保留它们
DF = DF.loc[DF['orderdate']>=DF['startdate']]
Inserting the ~
in front of the condition in parenthesis will reverse the condition and remove all the rows that do not satisfy it.在括号中的条件前面插入
~
将反转条件并删除所有不满足它的行。
DF = DF.loc[~(DF['orderdate']<DF['startdate'])]
1- loc
takes the rows from the 'orderdate' column and compares them with the rows from the 'startdate' column. 1-
loc
从“orderdate”列中获取行,并将它们与“startdate”列中的行进行比较。 Where the condition is true, it returns the index of the lines and stores it in the ids array.如果条件为真,则返回行的索引并将其存储在 ids 数组中。
2 - The drop method deletes lines in the dataframe, the parameters are the array with the indices of the lines, and inplace = True, this ensures that the operation is performed on the dataframe itself, if it is False operation it will return a copy of the dataframe 2-drop方法删除dataframe中的行,参数为行索引的数组,inplace=True,这样保证操作是在dataframe本身上进行的,如果是False操作会返回一份dataframe
# Get names of indexes for which column orderdate > = startdate
ids = DF.loc[DF['orderdate'] >= DF['startdate']].index
# Delete these row indexes from dataFrame
DF.drop(ids, inplace=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.