简体   繁体   中英

counting the number of co-occurences in a list

I have an array consisting of a set of lists of strings (can assume each string is a single word).

I want an efficient way, in Python, to count pairs of words in this array.

It is not collocation or bi-grams, as each word in the pair may be in any position on the list.

It's unclear how your list is, Is it something like:

li = ['hello','bye','hi','good','bye','hello']

If so the solution is simple:

In [1342]: [i for i in set(li) if li.count(i) > 1]
Out[1342]: ['bye', 'hello']

Otherwise if it is like:

li = [['hello'],['bye','hi','good'],['bye','hello']]

Then:

In [1378]: f = []

In [1379]: for x in li:
..........     for i in x:
..........         f.append(i)

In [1380]: f
Out[1380]: ['hello', 'bye', 'hi', 'good', 'bye', 'hello']

In [1381]: [i for i in set(f) if f.count(i) > 1]
Out[1381]: ['bye', 'hello']
>>> from itertools import chain
>>> from collections import Counter
>>> L = [['foo', 'bar'], ['apple', 'orange', 'mango'], ['bar']]
>>> c = Counter(frozenset(x) for x in combinations(chain.from_iterable(L), r=2))
>>> c
Counter({frozenset(['mango', 'bar']): 2, frozenset(['orange', 'bar']): 2, frozenset(['foo', 'bar']): 2, frozenset(['bar', 'apple']): 2, frozenset(['orange', 'apple']): 1, frozenset(['foo', 'apple']): 1, frozenset(['bar']): 1, frozenset(['orange', 'mango']): 1, frozenset(['foo', 'mango']): 1, frozenset(['mango', 'apple']): 1, frozenset(['orange', 'foo']): 1})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM