简体   繁体   English

使用组过滤器,当列值在另一个行列值的范围内时,熊猫选择行

[英]Pandas select rows when column value within range from another row column value with group filter

i would like to extend a question i asked on link to question 我想扩展一个我在问题链接上提出的问题

the scenario is more complex, so i think the solutions there will not fit 情况比较复杂,所以我认为那里的解决方案不适合

I'm trying to create subset from dataframe(100k-500k rows) with the following format 我正在尝试使用以下格式从数据框(100k-500k行)创建子集

d = {'time':[1,2,3,5,7,9,9.5,10], 'val':['not','match','match','not','not','match','match','match'],
    'group':['a','a','b','b','b','a','a','c']}
df = pd.DataFrame(d)
print(df)
  group  time    val
0     a   1.0    not
1     a   2.0  match
2     b   3.0  match
3     b   5.0    not
4     b   7.0    not
5     a   9.0  match
6     a   9.5  match
7     c  10.0  match

I want to select a subset that include all rows when time are within limited range. 当时间在有限范围内时,我想选择一个包含所有行的子集。 For example if range is <=1 the first and last three rows are selected, and are from different groups 例如,如果范围<= 1,则选择第一行和最后三行,它们来自不同的组

  • row0 has valid time diff (row1-row0) but they are in the same group. row0具有有效的时间差异(row1-row0),但它们在同一组中。
  • row1 has valid time diff (row2-row1) and each have a different group. row1具有有效的时间差异(row2-row1),并且每个都有不同的组。
  • row5 has valid time diff (row7-row5) and each have a different group. row5具有有效的时间差异(row7-row5),并且每个都有不同的组。
  • row6 has valid time diff (row7-row6) and each have a different group. row6具有有效的时间差异(row7-row6),并且每个都有不同的组。

And my desired output 而我想要的输出

  group  time    val
1     a   2.0  match
2     b   3.0  match
5     a   9.0  match
6     a   9.5  match
7     c  10.0  match

这适用于您的示例,希望适用于您的数据:

df.loc[((df['time'].diff() <= 1)|(df['time'].diff(-1) >= -1))&((df['group']!=df['group'].shift(-1).fillna(df['group']))|(df['group']!=df['group'].shift(1).fillna(df['group'])))]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当列值在另一行列值的范围内时,Pandas会选择行 - Pandas select rows when column value within range from another row column value 如何根据行中的特定值和熊猫中的另一列对行进行分组? - How to group rows based on specific value in a row and another column in pandas? 如何 select pandas 行在一个列中具有最大值,来自一组共享两个公共列的行? - How to select pandas row with maximum value in one column, from a group of rows that share two common columns? 当同一行中的另一列为NaN时,如何从熊猫数据框中选择特定的列值? - How to select a particular column value from a pandas dataframe when another column in the same row is NaN? Select 行按列值并按另一个列值包含前一行 - Select rows by column value and include previous row by another column value 根据熊猫中另一列的值选择一个列范围 - Select a range of column base on value of another column in pandas Pandas Groupby-如果多行超过另一行的值,则选择一列中值最高的行 - Pandas Groupby - select row with highest value in one column if multiple rows exceed value in another Pandas:根据不同组中另一列的值过滤行(合计两列) - Pandas: filter the row according to the value of another column in different group (two columns in aggregate) 在组内按日期将最新列值分配给 pandas 中的其他行 - Within group assign latest column value by date to other rows in pandas 计算pandas数据框中另一列对值分组之前的行数 - count number of rows before a value group by another column in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM