简体   繁体   中英

python list comprehension without in

if flattend is just a list of strings, for example

['There','is','only','passion','and','piece','is','a','lie','lie','lie']

then in following two lines

c = Counter(flattened)
vocab = [x for x, count in c.items() if count>=2]

what does the part [x for x,...] mean? also, shouldn't count be of type tuple as i suppose it is a counter item? how come this part count>=2 work?!

Note: I understand from debugging that the first line converts the list into a counter and the second one removes the items that occurred less than twice. but i can't really interpret the syntax

So the syntax here is a little confusing, but what's actually happening is that each item in c.items() is a tuple containing a word and its count.

A more clear way of writing this would be:

vocab = [x for (x, count) in c.items() if x>=2]

but it could be also be done like this:

vocab = [x[0] for x in c.items() if x[1]>=2]

where x is a tuple.

It can also be helpful to look at what c actually looks like. If you print c, you see:

>>> print c
Counter({'lie': 3, 'is': 2, 'and': 1, 'a': 1, 'There': 1, 'only': 1, 'passion': 1, 'piece': 1})

and c.items()

>>> print c.items()
[('and', 1), ('a', 1), ('lie', 3), ('is', 2), ('There', 1), ('only', 1), ('passion', 1), ('piece', 1)]

Counter will return a dictionary like structure. So you need to iterate over keys and values, key is x and value is count. If we look closely at c.items()

c.items() #list of tuples with (key,value)

[('and', 1),
 ('a', 1),
 ('lie', 3),
 ('is', 2), # x->'is' ,count->2
 ('There', 1),
 ('only', 1),
 ('passion', 1),
 ('piece', 1)]

So if you are iterating this list for a single tuple there are two components: a word and associated count. For count you are checking if the count>=2 if yes then returning that key which in list comphrension is x

[x for x, ...] is just using x as an variable while iterating over some array...

x, count captures the two items that serve as iterated values from c.items() .

If you were to print the results of: for _ in c.items(): print(_) That would print out a list of tuples like (x, count) .

[x for x, count in c.items() if count > 2] just preserves x in the array while using the count iterable as a filter.

Let's break it down into lines:

vocab = [           # line0
         x          # line1
         for        # line2
         x, count   # line3
         in
         c.items()
         if
         count>=2]  # line7

Each tuple from c.items() is composed of a key, x , (the thing that was counted) and a count (the number of times that key was seen).

On each loop, you can imagine the next tuple is pulled, then unpacked, so that instead of needing to use a single value with indices 0 and 1 , you can just refer to them by name; anontuple[0] becomes x , anontuple[1] becomes count .

The count>=2 line then filters the results; if count is less than 2 , we stop processing this item, and pull the next one.

The plain x on the far left is the item to produce; when the filtering check is passed, we shove the corresponding x into the resulting list unmodified.

Converting to a regular loop, it would look like this (lines matched to listcomp lines):

vocab = []                  # line0
for x, count in c.items():  # lines 2-5
    if count >= 2:          # lines 6-7
        vocab.append(x)     # line1

If unpacking is confusing to you, you could instead imagine it as:

vocab = []              # line0
for item in c.items():  # lines 2, 4 and 5
    x = item[0]         # line3
    count = item[1]     # line3
    if count >= 2:      # line 6-7
        vocab.append(x) # line1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM