计算 Pandas dataframe 中分组行子集中的唯一值。（标题我尽力了）

Question

I am trying to find the fastest time of a certain sender.我试图找到某个发件人的最快时间。 In the attached picture Pandas DF you will see that I have rows of IP addresses with time and SeqNo (I know there are others but they dont matter).在随附的图片Pandas DF中，您会看到我有一排 IP 地址以及时间和 SeqNo（我知道还有其他地址，但它们无关紧要）。 And basically what Im trying to do is find which IP has the fastest (so smallest number in the time column) where the SeqNo are the same.基本上我想做的是找到哪个 IP 在 SeqNo 相同的情况下最快（时间列中的数字最小）。 So for example with SeqNo 0 the fastest IP would be 10.10.10.7 because it has the smallest value in time, which is in unix.例如，对于 SeqNo 0，最快的 IP 将是 10.10.10.7，因为它具有最小的时间值，即 unix。 I need to do this over all the groups of SeqNos and find which IP has fastest, so most smallest times per group of SeqNo'.我需要对所有 SeqNos 组执行此操作，并找到 IP 最快的，因此每组 SeqNo' 的时间最短。

I have tried a few different for, nested for and while loops and a few different things in pandas but im having no luck.我在 pandas 中尝试了一些不同的 for、嵌套 for 和 while 循环以及一些不同的东西，但我没有运气。 Please help out if necessary.如有需要请帮忙。

Answer 1

Instead of looping, pandas lets you groupby() the SeqNo and get the minimum Time per group: pandas 不是循环，而是让您groupby() SeqNo并获得每组的最短Time ：

index = df.groupby('SeqNo').Time.transform('min') == df.Time
df[index]

#        Source  SeqNo  Time
# 0  10.10.10.8      0     0
# 2  10.10.10.2      1     2
# 5  10.10.10.8      2     5

Based on this sample data:基于此样本数据：

df = pd.DataFrame({'Source':[f'10.10.10.{i}' for i in np.random.randint(1,9,10)],'SeqNo':[0]*2+[1]*3+[2]*5,'Time':range(10)})

#        Source  SeqNo  Time
# 0  10.10.10.8      0     0
# 1  10.10.10.6      0     1
# 2  10.10.10.2      1     2
# 3  10.10.10.4      1     3
# 4  10.10.10.2      1     4
# 5  10.10.10.8      2     5
# 6  10.10.10.7      2     6
# 7  10.10.10.7      2     7
# 8  10.10.10.3      2     8
# 9  10.10.10.8      2     9

Answer 2

You could sort the dataframe by SeqNo and Time column first.您可以先按SeqNo和Time列对 dataframe 进行排序。 After grouping by SeqNo , use head(1) or first() to choose the first item in each group.按SeqNo分组后，使用head(1)或first()选择每个组中的第一项。

df = df.sort_values(['SeqNo', 'Time'], ascending=[True, True]).groupby('SeqNo').head(1)

计算 Pandas dataframe 中分组行子集中的唯一值。（标题我尽力了）

问题描述

2 个解决方案

解决方案1
0 2021-03-28 05:25:15

解决方案2
0 2021-03-28 09:50:07

计算 Pandas dataframe 中分组行子集中的唯一值。 （标题我尽力了）

问题描述

2 个解决方案

解决方案1 0 2021-03-28 05:25:15

解决方案2 0 2021-03-28 09:50:07

计算 Pandas dataframe 中分组行子集中的唯一值。（标题我尽力了）

解决方案1
0 2021-03-28 05:25:15

解决方案2
0 2021-03-28 09:50:07