if flattend is just a list of strings, for example
['There','is','only','passion','and','piece','is','a','lie','lie','lie']
then in following two lines
c = Counter(flattened)
vocab = [x for x, count in c.items() if count>=2]
what does the part [x for x,...] mean? also, shouldn't count be of type tuple as i suppose it is a counter item? how come this part count>=2
work?!
Note: I understand from debugging that the first line converts the list into a counter and the second one removes the items that occurred less than twice. but i can't really interpret the syntax
So the syntax here is a little confusing, but what's actually happening is that each item in c.items()
is a tuple containing a word and its count.
A more clear way of writing this would be:
vocab = [x for (x, count) in c.items() if x>=2]
but it could be also be done like this:
vocab = [x[0] for x in c.items() if x[1]>=2]
where x
is a tuple.
It can also be helpful to look at what c
actually looks like. If you print c, you see:
>>> print c
Counter({'lie': 3, 'is': 2, 'and': 1, 'a': 1, 'There': 1, 'only': 1, 'passion': 1, 'piece': 1})
and c.items()
>>> print c.items()
[('and', 1), ('a', 1), ('lie', 3), ('is', 2), ('There', 1), ('only', 1), ('passion', 1), ('piece', 1)]
Counter will return a dictionary like structure. So you need to iterate over keys and values, key is x and value is count. If we look closely at c.items()
c.items() #list of tuples with (key,value)
[('and', 1),
('a', 1),
('lie', 3),
('is', 2), # x->'is' ,count->2
('There', 1),
('only', 1),
('passion', 1),
('piece', 1)]
So if you are iterating this list for a single tuple there are two components: a word and associated count. For count you are checking if the count>=2
if yes then returning that key which in list comphrension is x
[x for x, ...]
is just using x
as an variable while iterating over some array...
x, count
captures the two items that serve as iterated values from c.items()
.
If you were to print the results of: for _ in c.items(): print(_)
That would print out a list of tuples like (x, count)
.
[x for x, count in c.items() if count > 2]
just preserves x
in the array while using the count
iterable as a filter.
Let's break it down into lines:
vocab = [ # line0
x # line1
for # line2
x, count # line3
in
c.items()
if
count>=2] # line7
Each tuple
from c.items()
is composed of a key, x
, (the thing that was counted) and a count
(the number of times that key was seen).
On each loop, you can imagine the next tuple
is pulled, then unpacked, so that instead of needing to use a single value with indices 0
and 1
, you can just refer to them by name; anontuple[0]
becomes x
, anontuple[1]
becomes count
.
The count>=2
line then filters the results; if count
is less than 2
, we stop processing this item, and pull the next one.
The plain x
on the far left is the item to produce; when the filtering check is passed, we shove the corresponding x
into the resulting list
unmodified.
Converting to a regular loop, it would look like this (lines matched to listcomp lines):
vocab = [] # line0
for x, count in c.items(): # lines 2-5
if count >= 2: # lines 6-7
vocab.append(x) # line1
If unpacking is confusing to you, you could instead imagine it as:
vocab = [] # line0
for item in c.items(): # lines 2, 4 and 5
x = item[0] # line3
count = item[1] # line3
if count >= 2: # line 6-7
vocab.append(x) # line1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.