简体   繁体   English

从数据框中选择具有可变数量条件的行

[英]Select rows from Dataframe with variable number of conditions

I'm trying to write a function that takes as inputs a DataFrame with a column 'timestamp' and a list of tuples.我正在尝试编写一个函数,该函数将带有“时间戳”列和元组列表的 DataFrame 作为输入。 Every tuple will contain a beginning and end time.每个元组将包含开始和结束时间。

What I want to do is to "split" the dataframe in two new ones, where the first contains the rows for which the timestamp value is not contained between the extremes of any tuple, and the other is just the complementary.我想要做的是将数据帧“拆分”为两个新的数据帧,其中第一个包含时间戳值不包含在任何元组的极端之间的行,另一个只是补充。 The number of filter tuples is not known a priori though.但是,过滤器元组的数量并不是先验的。

df = DataFrame({'timestamp':[0,1,2,5,6,7,11,22,33,100], 'x':[1,2,3,4,5,6,7,8,9,1])
filt = [(1,4), (10,40)]
left, removed = func(df, filt)

This should give me two dataframes这应该给我两个数据框

  • left : with rows with timestamp [0,5,6,7,100] left : 带有时间戳[0,5,6,7,100]
  • removed : with rows with timestamp [1,2,11,22,33]删除:带有时间戳的行[1,2,11,22,33]

I believe the right approach is to write a custom function that can be used as a filter, and then call is somehow to filter/mask the dataframe, but I could not find a proper example of how to implement this.我相信正确的方法是编写一个可用作过滤器的自定义函数,然后调用以某种方式过滤/屏蔽数据帧,但我找不到如何实现这一点的正确示例。

Check查看

out = df[~pd.concat([df.timestamp.between(*x) for x in filt]).any(level=0)]
Out[175]: 
   timestamp  x
0          0  1
3          5  4
4          6  5
5          7  6
9        100  1

你不能用.isin()过滤:

left,removed = df[df['timestamp'].isin([0,5,6,7,100])],df[df['timestamp'].isin([1,2,11,22,33])]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM