简体   繁体   中英

Python Pandas check if a value occurs more then once in the same day

I have a Pandas dataframe as below. What I am trying to do is check if a station has variable yyy and any other variable on the same day (as in the case of station1 ). If this is true I need to delete the whole row containing yyy .

Currently I am doing this using iterrows() and looping to search the days in which this variable appears, changing the variable to something like "delete me", building a new dataframe from this (because pandas doesn't support replacing in place ) and filtering the new dataframe to get rid of the unwanted rows. This works now because my dataframes are small, but is not likely to scale.

Question: This seems like a very "non-Pandas" way to do this, is there some other method of deleting out the unwanted variables?

                dateuse         station         variable1
0   2012-08-12 00:00:00        station1               xxx
1   2012-08-12 00:00:00        station1               yyy
2   2012-08-23 00:00:00        station2               aaa
3   2012-08-23 00:00:00        station3               bbb
4   2012-08-25 00:00:00        station4               ccc
5   2012-08-25 00:00:00        station4               ccc
6   2012-08-25 00:00:00        station4               ccc

I might index using a boolean array. We want to delete rows (if I understand what you're after, anyway!) which have yyy and more than one dateuse / station combination.

We can use transform to broadcast the size of each dateuse / station combination up to the length of the dataframe, and then select the rows in groups which have length > 1. Then we can & this with where the yyy s are.

>>> multiple = df.groupby(["dateuse", "station"])["variable1"].transform(len) > 1
>>> must_be_isolated = df["variable1"] == "yyy"
>>> df[~(multiple & must_be_isolated)]
               dateuse   station variable1
0  2012-08-12 00:00:00  station1       xxx
2  2012-08-23 00:00:00  station2       aaa
3  2012-08-23 00:00:00  station3       bbb
4  2012-08-25 00:00:00  station4       ccc
5  2012-08-25 00:00:00  station4       ccc
6  2012-08-25 00:00:00  station4       ccc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM