[英]Python Pandas check if a value occurs more then once in the same day
I have a Pandas dataframe as below. 我有一个Pandas数据帧,如下所示。 What I am trying to do is check if a station has variable
yyy
and any other variable on the same day (as in the case of station1
). 我要做的是检查一个站是否在同一天有变量
yyy
和任何其他变量(如在station1
的情况下)。 If this is true I need to delete the whole row containing yyy
. 如果这是真的,我需要删除包含
yyy
的整行。
Currently I am doing this using iterrows()
and looping to search the days in which this variable appears, changing the variable to something like "delete me", building a new dataframe from this (because pandas doesn't support replacing in place ) and filtering the new dataframe to get rid of the unwanted rows. 目前我正在使用
iterrows()
并循环搜索此变量出现的日期,将变量更改为“删除我”,从此构建新数据帧(因为pandas不支持替换 )和过滤新数据帧以消除不需要的行。 This works now because my dataframes are small, but is not likely to scale. 这样做现在因为我的数据帧很小,但不太可能扩展。
Question: This seems like a very "non-Pandas" way to do this, is there some other method of deleting out the unwanted variables? 问题:这似乎是非常“非熊猫”的方法,是否有其他方法可以删除不需要的变量?
dateuse station variable1
0 2012-08-12 00:00:00 station1 xxx
1 2012-08-12 00:00:00 station1 yyy
2 2012-08-23 00:00:00 station2 aaa
3 2012-08-23 00:00:00 station3 bbb
4 2012-08-25 00:00:00 station4 ccc
5 2012-08-25 00:00:00 station4 ccc
6 2012-08-25 00:00:00 station4 ccc
I might index using a boolean array. 我可能使用布尔数组索引。 We want to delete rows (if I understand what you're after, anyway!) which have
yyy
and more than one dateuse
/ station
combination. 我们想要删除行(如果我理解你所追求的,无论如何!),它们有
yyy
和多个dateuse
/ station
组合。
We can use transform
to broadcast the size of each dateuse
/ station
combination up to the length of the dataframe, and then select the rows in groups which have length > 1. Then we can &
this with where the yyy
s are. 我们可以使用
transform
来广播每个dateuse
/ station
组合的大小,直到数据帧的长度,然后选择长度> 1的组中的行。然后我们可以&
yyy
的位置一起使用。
>>> multiple = df.groupby(["dateuse", "station"])["variable1"].transform(len) > 1
>>> must_be_isolated = df["variable1"] == "yyy"
>>> df[~(multiple & must_be_isolated)]
dateuse station variable1
0 2012-08-12 00:00:00 station1 xxx
2 2012-08-23 00:00:00 station2 aaa
3 2012-08-23 00:00:00 station3 bbb
4 2012-08-25 00:00:00 station4 ccc
5 2012-08-25 00:00:00 station4 ccc
6 2012-08-25 00:00:00 station4 ccc
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.