在应用 groupby 后，如何根据行间不同列值的比较来 select 行？

Question

If I have the following data如果我有以下数据

Name姓名	Start开始	End结尾
A一个	3/4/12 2012 年 3 月 4 日	7/9/14 2014 年 7 月 9 日
B乙	5/2/17 17 年 5 月 2 日	6/3/18 2018 年 6 月 3 日
C C	4/10/13 2013 年 4 月 10 日	5/12/14 2014 年 5 月 12 日
A一个	4/6/13 2013 年 4 月 6 日	7/12/15 2015 年 7 月 12 日
B乙	4/12/19 2019 年 4 月 12 日	12/3/21 21 年 12 月 3 日
c c	12/6/13 2013 年 12 月 6 日	11/3/14 2014 年 11 月 3 日

For each unique name (A, B, C) I want to select the rows whose end dates fall later than the start date for that same name for every row other than the one whose end date is being considered.对于每个唯一名称（A、B、C），我想 select 对于除正在考虑其结束日期的行之外的每一行，其结束日期晚于相同名称的开始日期的行。 So, A and C in this case.因此，在这种情况下，A 和 C。 Basically, first use groupby (['Name'] and then pick rows where the end date is later than the start date when comparison is made across rows for the same name.基本上，首先使用 groupby (['Name'] ，然后在对同名的行进行比较时选择结束日期晚于开始日期的行。

Answer 1

Convert both columns to datetimes, so possible compare for less by Series.lt and then use GroupBy.all for check if all True s per Name and last filter indices:将两列都转换为日期时间，因此可以通过Series.lt比较 less ，然后使用GroupBy.all检查是否所有True s per Name和最后一个过滤器索引：

df['Start'] = pd.to_datetime(df['Start'], format='%m/%d/%y')
df['End'] = pd.to_datetime(df['End'], format='%m/%d/%y')

s = df['Start'].lt(df['End']).groupby(df['Name']).all()

out = s.index[s].tolist()
print (out)

Or change mask for Series.gt and get differencies by numpy.setdiff1d :或更改Series.gt的掩码并通过numpy.setdiff1d获得差异：

df['Start'] = pd.to_datetime(df['Start'], format='%m/%d/%y')
df['End'] = pd.to_datetime(df['End'], format='%m/%d/%y')

out = np.setdiff1d(df['Name'], df.loc[df['Start'].gt(df['End']), 'Name']).tolist()

在应用 groupby 后，如何根据行间不同列值的比较来 select 行？

问题描述

1 个解决方案

解决方案1
0 2021-12-22 07:15:13

在应用 groupby 后，如何根据行间不同列值的比较来 select 行？

问题描述

1 个解决方案

解决方案1 0 2021-12-22 07:15:13

解决方案1
0 2021-12-22 07:15:13