简体   繁体   中英

How to remove an item in list once used from a large list in python to save the memory?

If i have large list which runs in millions of items, i want to iterate through each of them. Once i use the item it will never be used again, so how do i delete the item from the list once used? What is the best approach? I know numpy is fast and efficient but want to know how it can be done using normal list.

mylst = [item1, item2,............millions of items]
for each_item in mylist:
    #use the item
    #delete the item to free that memory

You cannot delete an object directly in Python - an object's memory is automatically reclaimed, by garbage collection, when it's no longer possible to reference the object. So long as an object is in a list, it may be referenced again later (via the list).

So you need to destroy the list too. For example, like so:

while mylst:
    each_item = mylst.pop()  # removes an object from the end of the list
    # use the item

Assuming you can copy a list (memory constraints might cause issues here) and only need to remove specific elements from it, you can create a shallow copy of the list and remove elements from it while iterating through the original list:

a_list = [1, 2, 3, 4, 5]
b_list = a_list.copy()
removal_key = 0
for element in a_list:
    if element % 2 == 0:
        b_list.pop(removal_key)
        removal_key -= 1; # we need to push the removal key back afer every deletion as our array b_list becomes smaller than the original after every deletion
    removal_key += 1
print(b_list) #[1, 3, 5]

If creating the 2nd list is not an option, you can store the key's of elements to be removed from the list and then use a second list to remove them :

a_list = [1, 2, 3, 4, 5]
elements_to_remove = []
for key, element in enumerate(a_list):
    if element % 2 == 0:
        elements_to_remove.append(key)

removed_emelent_count = 0
for element in elements_to_remove:
    a_list.pop(element - removed_emelent_count)
    removed_emelent_count += 1
print(a_list) #[1, 3, 5]

Note that the 1st solution is more time efficient (especially when removing a lot of elements) while the 2nd solution is more memory efficient, especially when removing smal number of elements from the list.

This is probably the case in which you should use generators .

A generator is a function that returns an object which we can iterate over, one value at a time, using the special keyword yield instead of return . They allows you to have a smaller memory footprint, by keeping only one element per iteration.

In python3.x, range is actually a generator (python2.x is xrange ).

Overly simple example:

>>> def range(start, end):
...     current = start
...     while current < end:
...         yield current
...         current += 1
...
>>> for i in range(0, 2):
...     print(i)
...
0
1

How is this million entries list made?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM