简体   繁体   中英

How to find the N most common occurring element in a list?

I have a large list of tuples like [('a','b'), ('a','c'), ('b','e'), ('a','d')] and I would like to find top N popular entries in this list. The most popular is the one which is repeated the most. Here top two popular are a and b . How can I utlize networkx to solve this problem if the number of list items are of million size?

import networkx as nx
lists = [('a','b'), ('a','c'), ('b','e'), ('a','d')]
G=nx.Graph()
G.add_edges_from(pair_names)
nx.degree(G)

this gives me list of popularity but I am having trouble displaying the top N popular items.

You can cast the degree view to a dictionary and use collections.Counter.most_common method:

import networkx as nx
from collections import Counter
lst = [('a','b'), ('a','c'), ('b','e'), ('a','d')]
G=nx.Graph()
G.add_edges_from(lst)
degrees = G.degree()
most_common = Counter(dict(degrees)).most_common(1)

Output:

('a', 3)

It seems you want to learn about networkx but for your specific question, you don't even need to create a graph, you can simply flatten the list and count elements:

from itertools import chain
from collections import Counter
counts = Counter(chain.from_iterable(lst))
counts.most_common(1)

Any particular reason why you would want to use networkx ?

You can achieve this simply with collections.Counter and itertools.chain :

from collections import Counter
from itertools import chain

l = [('a','b'), ('a','c'), ('b','e'), ('a','d')]

Counter(chain.from_iterable(l)).most_common(2)

NB. Here for the top 2

Output: [('a', 3), ('b', 2)]

To only get the keys in decreasing order of frequency:

c = Counter(chain.from_iterable(l))

list(dict(c.most_common(2)))

Output: ['a', 'b']

You can just use a for loop to iterate over the degrees.

import networkx as nx
lists = [('a','b'), ('a','c'), ('b','e'), ('a','d')]
G=nx.Graph()
G.add_edges_from(lists)
N = 2
dvweight = nx.degree(G)
popular_nodes = [nodes[0] for nodes in dvweight]
print(popular_nodes[:N])

OUTPUT

['a', 'b']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM