[英]Counting unique values in a subset of grouped rows in a Pandas dataframe. (I did my best with the title)
I am trying to find the fastest time of a certain sender.我试图找到某个发件人的最快时间。 In the attached picture Pandas DF you will see that I have rows of IP addresses with time and SeqNo (I know there are others but they dont matter).
在随附的图片Pandas DF中,您会看到我有一排 IP 地址以及时间和 SeqNo(我知道还有其他地址,但它们无关紧要)。 And basically what Im trying to do is find which IP has the fastest (so smallest number in the time column) where the SeqNo are the same.
基本上我想做的是找到哪个 IP 在 SeqNo 相同的情况下最快(时间列中的数字最小)。 So for example with SeqNo 0 the fastest IP would be 10.10.10.7 because it has the smallest value in time, which is in unix.
例如,对于 SeqNo 0,最快的 IP 将是 10.10.10.7,因为它具有最小的时间值,即 unix。 I need to do this over all the groups of SeqNos and find which IP has fastest, so most smallest times per group of SeqNo'.
我需要对所有 SeqNos 组执行此操作,并找到 IP 最快的,因此每组 SeqNo' 的时间最短。
I have tried a few different for, nested for and while loops and a few different things in pandas but im having no luck.我在 pandas 中尝试了一些不同的 for、嵌套 for 和 while 循环以及一些不同的东西,但我没有运气。 Please help out if necessary.
如有需要请帮忙。
Instead of looping, pandas lets you groupby()
the SeqNo
and get the minimum Time
per group: pandas 不是循环,而是让您
groupby()
SeqNo
并获得每组的最短Time
:
index = df.groupby('SeqNo').Time.transform('min') == df.Time
df[index]
# Source SeqNo Time
# 0 10.10.10.8 0 0
# 2 10.10.10.2 1 2
# 5 10.10.10.8 2 5
Based on this sample data:基于此样本数据:
df = pd.DataFrame({'Source':[f'10.10.10.{i}' for i in np.random.randint(1,9,10)],'SeqNo':[0]*2+[1]*3+[2]*5,'Time':range(10)})
# Source SeqNo Time
# 0 10.10.10.8 0 0
# 1 10.10.10.6 0 1
# 2 10.10.10.2 1 2
# 3 10.10.10.4 1 3
# 4 10.10.10.2 1 4
# 5 10.10.10.8 2 5
# 6 10.10.10.7 2 6
# 7 10.10.10.7 2 7
# 8 10.10.10.3 2 8
# 9 10.10.10.8 2 9
You could sort the dataframe by SeqNo
and Time
column first.您可以先按
SeqNo
和Time
列对 dataframe 进行排序。 After grouping by SeqNo
, use head(1)
or first()
to choose the first item in each group.按
SeqNo
分组后,使用head(1)
或first()
选择每个组中的第一项。
df = df.sort_values(['SeqNo', 'Time'], ascending=[True, True]).groupby('SeqNo').head(1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.