简体   繁体   中英

Pandas DataFrame isin(): How does conditional selection work in detail?

While working with pandas I ran into an issue which I can't quite explain. Let me give an example where the DataFrame is called "reviews":

The following code doesn't run:

reviews[(reviews["points"] >= 95) & (reviews["country"] in ["Australia"])]

Instead one can use:

reviews[(reviews["points"] >= 95) & (reviews["country"].isin(["Australia"]))]

My first assumption was that this is caused by the way the bitwise operator & works, but testing this I was suprised to find out the follwing line equals to True: True & ("hi" in ["hi", "Hello"])

Obviously reviews["country"] is not just a str. I guess with the operator >= some magic happens that is not implemented for in . Therefore, isin() is necessary. Maybe someone can explain this further / better?

The example works with something like the following DataFrame:

    country     description     designation     points  
0   Italy       Aromas          Vulkà Bianco    87  

This structure is basically taken from https://www.kaggle.com/learn/pandas lesson 2.9.

Error-MSG: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

in is a python keyword, while isin is a method for the Series which checks "whether each element in the DataFrame is contained in values." link

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM