计算元组列表中唯一元组的发生率

Question

For a classifieds Django website project, I have a list of tuples made up of (user_id, ad_id) pairs. 对于分类Django网站项目，我有一个由(user_id, ad_id)对组成的元组列表。 This denotes the clicker's user_id , alongwith the relevant ad_id . 这表示点击者的user_id以及相关的ad_id 。

For example: 例如：

gross_clicks = [(1, 13),(1, 12), (1, 13), (2, 45), (2, 13), (1, 15), ...(n, m)]

The elements in this list are by no means unique - each click gets pushed into this list regardless of whether it's by the same user and/or it's on the same ad. 此列表中的元素绝不是唯一的-每次点击都会被推送到此列表中，无论是同一用户和/或同一广告。

Now I can get all unique clicks by doing: 现在，我可以通过执行以下操作获得所有唯一点击：

unique_clicks = []
import operator
gross_click_ids = map(operator.itemgetter(0), gross_clicks)
return len(set(gross_click_ids))

But how do I get unique clicks per ad ? 但是，如何获得每个广告的唯一点击 ？ Ie if same user clicked on two different ads, that would be counted as 2 separate clicks. 也就是说，如果同一用户点击了两个不同的广告，则将被计为2次单独的点击。

Performance matters too - it's a large data set - so would prefer the most efficient solution, alongwith an illustrative example. 性能也很重要-它是一个大数据集-因此，它会首选最有效的解决方案以及一个说明性示例。

Answer 1

Use the distinct method on the queryset instead. 而是在queryset上使用distinct方法。 Let's say your model is User and you want to get unique user_id , ad_id pairs. 假设您的模型是User并且您想要获得唯一的user_id和ad_id对。

User.objects.all().values_list('id', 'ad_id').distinct('id', 'ad_id')

This performs the work on the database level, so I expect it would be faster than doing it in Python as Willem mentioned. 这将在数据库级别执行工作，因此我希望它比Willem提到的在Python中完成工作要快。

I may have misunderstood your question. 我可能误解了您的问题。 Please let me know if that's the case so I can try to provide an alternate solution. 如果是这种情况，请告诉我，以便我尝试提供替代解决方案。

Answer 2

Just take unique tuples: 只需采用唯一的元组：

unique_clicks = set(gross_clicks)

This gives you the set of unique user impressions per ad. 这为您提供了每个广告的唯一身份用户印象集。

In your sample input, (1, 13) appears twice, but in a set it would appear just once: 在示例输入中， (1, 13)出现两次，但是在一组中，它只会出现一次：

>>> gross_clicks = [(1, 13), (1, 12), (1, 13), (2, 45), (2, 13), (1, 15)]
>>> set(gross_clicks)
{(1, 12), (1, 13), (1, 15), (2, 45), (2, 13)}

Using sets to track unique elements is as efficient as it can get, given a large list of tuples as input (testing if any given tuple is already in the set is a O(1) constant time operation). 在给定大量元组作为输入的情况下，使用集合来跟踪唯一元素是尽可能高效的（测试集合中是否已有给定的元组是O（1）恒定时间操作）。

However, if this data came from your database, it is more efficient to ask it to give you unique pairs instead. 但是，如果此数据来自您的数据库，则要求它为您提供唯一的对更为有效。

计算元组列表中唯一元组的发生率

问题描述

2 个解决方案

解决方案1
0 2017-07-29 15:40:27

解决方案2
0 已采纳 2017-07-29 15:43:58

计算元组列表中唯一元组的发生率

问题描述

2 个解决方案

解决方案1 0 2017-07-29 15:40:27

解决方案2 0 已采纳 2017-07-29 15:43:58

解决方案1
0 2017-07-29 15:40:27

解决方案2
0 已采纳 2017-07-29 15:43:58