简体   繁体   中英

How to filter DF based on multiple conditions

I have a df that I am trying to filter, using multiple conditions

remove_outliers[remove_outliers['outlier_residual'] > (Q3 + 1.5 * IQR) and remove_outliers['season'] =='Autumn']

when i try this i get the following error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-304-141eedb8a594> in <module>
----> 1 remove_outliers[remove_outliers['outlier_residual'] > (Q3 + 1.5 * IQR) and remove_outliers['season'] =='Autumn']

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py in __nonzero__(self)
   1328     def __nonzero__(self):
   1329         raise ValueError(
-> 1330             f"The truth value of a {type(self).__name__} is ambiguous. "
   1331             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1332         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

what is the correct way to do this? appreciate any help or advice

remove_outliers.loc[(remove_outliers['outlier_residual'] > (Q3 + 1.5 * IQR)) & (remove_outliers['season'] =='Autumn')]

他们不需要将.loc嵌套在.loc 中

I guess you missing a pair of brackets. Let me know whether it works now:

remove_outliers.loc[(remove_outliers.loc[:,'outlier_residual'] > (Q3 + 1.5 * IQR)) & remove_outliers.loc[:,'season'] =='Autumn'),:]

PS I have used .loc for good practice purpose

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM