简体   繁体   English

Python Pandas检查一个值是否在同一天发生了一次以上

[英]Python Pandas check if a value occurs more then once in the same day

I have a Pandas dataframe as below. 我有一个Pandas数据帧,如下所示。 What I am trying to do is check if a station has variable yyy and any other variable on the same day (as in the case of station1 ). 我要做的是检查一个站是否在同一天有变量yyy和任何其他变量(如在station1的情况下)。 If this is true I need to delete the whole row containing yyy . 如果这是真的,我需要删除包含yyy的整行。

Currently I am doing this using iterrows() and looping to search the days in which this variable appears, changing the variable to something like "delete me", building a new dataframe from this (because pandas doesn't support replacing in place ) and filtering the new dataframe to get rid of the unwanted rows. 目前我正在使用iterrows()并循环搜索此变量出现的日期,将变量更改为“删除我”,从此构建新数据帧(因为pandas不支持替换 )和过滤新数据帧以消除不需要的行。 This works now because my dataframes are small, but is not likely to scale. 这样做现在因为我的数据帧很小,但不太可能扩展。

Question: This seems like a very "non-Pandas" way to do this, is there some other method of deleting out the unwanted variables? 问题:这似乎是非常“非熊猫”的方法,是否有其他方法可以删除不需要的变量?

                dateuse         station         variable1
0   2012-08-12 00:00:00        station1               xxx
1   2012-08-12 00:00:00        station1               yyy
2   2012-08-23 00:00:00        station2               aaa
3   2012-08-23 00:00:00        station3               bbb
4   2012-08-25 00:00:00        station4               ccc
5   2012-08-25 00:00:00        station4               ccc
6   2012-08-25 00:00:00        station4               ccc

I might index using a boolean array. 我可能使用布尔数组索引。 We want to delete rows (if I understand what you're after, anyway!) which have yyy and more than one dateuse / station combination. 我们想要删除行(如果我理解你所追求的,无论如何!),它们有yyy和多个dateuse / station组合。

We can use transform to broadcast the size of each dateuse / station combination up to the length of the dataframe, and then select the rows in groups which have length > 1. Then we can & this with where the yyy s are. 我们可以使用transform来广播每个dateuse / station组合的大小,直到数据帧的长度,然后选择长度> 1的组中的行。然后我们可以& yyy的位置一起使用。

>>> multiple = df.groupby(["dateuse", "station"])["variable1"].transform(len) > 1
>>> must_be_isolated = df["variable1"] == "yyy"
>>> df[~(multiple & must_be_isolated)]
               dateuse   station variable1
0  2012-08-12 00:00:00  station1       xxx
2  2012-08-23 00:00:00  station2       aaa
3  2012-08-23 00:00:00  station3       bbb
4  2012-08-25 00:00:00  station4       ccc
5  2012-08-25 00:00:00  station4       ccc
6  2012-08-25 00:00:00  station4       ccc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:检查列表的最后一个值是否出现多次 - Python: check if last value of list occurs more than once 如果在 python (pandas) 中的稍后日期出现相同的值,则为虚拟 - Dummy if the same value occurs on a later date in python (pandas) Python Pandas:如何检查列值是否包含 3 个或更多重复数字 - Python Pandas: How to Check if Column Value Contains 3 or More Repeated Digits 检查python列表中是否有两个(或多个)具有相同属性值的自定义对象 - Check if there are two (or more) custom object with the same attribute value in a python list Python pandas:如果A列值出现多次,则分配B列的第一个值 - Python pandas: if column A value appears more than once, assign first value of column B Python Pandas:在具有相同索引的多行中检查列的值 - Python Pandas: Check the value of a column over multiple rows with the same index 不止一次导入相同的python模块 - Import same python module more than once Python熊猫按天重新采样,按值分组 - Python pandas resample by day, group by value Day in Pandas(Python)的第一个和最后一个值 - first and last value of a Day in Pandas (Python) 检查 object_id 是否在 queryset.annotate Case When 参数中出现多次 - Check if object_id occurs more than once in queryset.annotate Case When parameter
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM