简体   繁体   中英

How to Get Last Item in a Python Generator

Question : how can I get the last item in a python generator in a fast and memory-efficient way?

MWE :

import snscrape.modules.twitter as sntwitter
import time; start = time.time()

query = "Rooty Roo"

obj = sntwitter.TwitterSearchScraper(query)
print(obj) # didn't see much useful besides get_items

cnt = 0
items = obj.get_items()
for item in items:
  cnt += 1
  if cnt % 100 == 0:
    print(cnt)
  # end if
# end for
## the above seems ideal for memory-efficiency but 
## maybe super slow as I have no idea if there are 
## millions or billions of tweets in there. 
## Been running a few minutes and at ~17k so far.
## Not super ideal for playing around...

print(vars(tweet))
print("tweets: ", cnt)
print("executed in: ", time.time() - start)

I guess the above is not a super MWE since it relies on a package, but this is the first time I've encountered a generator. And is what prompted this question :)

Context : I'm trying to learn more about how this package works. I started reading the source but thought playing around and inspecting the data might be faster ¯\ (ツ)

Memory-Efficient Context : my laptop is turning 10 this year and I think part of the RAM is failing. Theoretically it has 8 GB RAM but using more than 1-2 GB causes browser pages to crash :D

Is this question answered already? Probably, but google search results for 'python get last item of a generator' return results for iterators...

The last item of a generator cannot (always) be determined.

Of some generators you cannot predict if they'll ever end (or the last element is uncertain):

import random

def random_series():
    while x := random.randint(1, 10) > 1:
        yield x


# print random numbers from generator until 1 is generated
for x in random_series():
    print(x)

Others will literally go on forever:

def natural_numbers():
    n = 0
    while True:
        n += 1
        yield n

# prints the first 10 natural numbers, but could go on forever
g = natural_numbers()
for _ in range(10):
    print(next(g))

However, every generator is an iterator, and you can try to get the last item (or the number of items) the same way you can for any other iterator that doesn't flat out tell you, or allow indexing.

For iterators that do:

# if i is some iterator that allows indexing and has a length:
print('last element: ', i[-1])
print('size: ', len(i))

For iterators that don't (but at least end):

print('last element: ', list(i)[-1])
print('size: ', len(list(i)))

However, if you try that on an infinite generator, your code will hang, or more likely crash as soon as it runs out of memory to put the list into. Also, note that every time you call list(i) , it will construct a new list, so if you need that list multiple times, you may want to assign the result to a variable to save time.

In your case:

items = list(obj.get_items())
print("tweets: ", len(items))
print("last tweet: ", items[-1])

Note: as user @kellybundy points out, creating a list is not very memory-efficient. If you don't care about the actual contents, other than the last element, this would work:

for n, last in enumerate(obj.get_items()):
    pass
# n will be the number of items - 1 and last will be the last item

This is memory-efficient, but the contents of the generator are now lost.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM