I do not understand how this flatten with extend and append works for Python 3.6

Question

I put the code below in Pythontutor.com to see if I could understand how this works. However, despite reading up on flattening, extend, and append I am a little lost. My question is why does it evaluate 'b' twice? For example, it goes to extend then creates a newlist and then takes 'b' to the else and appends? I would appreciate any help that will make this more clear to me.

aList = ['b','a','c',2],[[[3]],'dog',4,5]  
def flatten(aList):
    newList = [ ]
    for item in aList:
        if type(item) == type([]):
            newList.extend(flatten(item))
        else:
            newList.append(item)
    return newList

print(flatten(aList))

Answer 1

The function uses recursion to call itself again. The idea is that you break down a larger problem into smaller parts that you each solve independently, then combine the results to solve the larger problem.

Here, flatten() will call itself again whenever a contained element in the current sequence is a list. These recursive calls continue until the smaller part no longer contains more lists.

The thing to remember is that local names such as newList are local to each function call . Even if flatten() calls itself, each call results in a new, local newList value that is independent.

For your input, a tuple:

['b', 'a', 'c', 2], [[[3]], 'dog', 4, 5]

the first element is a list too:

['b', 'a', 'c', 2]

so that's passed to a new flatten() call. There are no more lists in that sub-list, so all the function then does is append each item to the newList list and return that as the result. Upon returning the first flatten() function is resumed and the returned list is added to the local newList with an extend() call.

All the while you look at how Pythontutor visualises this, you'll note that there are a lot of pointers to those lists within the original object:

You can see that the first flatten() call references a tuple with two elements, and that the second flatten() call references the first element of that tuple, the contained list. Python values all live in a dedicated area of memory called the 'heap', and names and list elements are all just labels , references, nametags with strings attached to those objects, and you can have any number such labels. See Ned Batchelder's excellent article on the subject . Both flatten() functions have their own newList reference pointing to a list object, and the currently active flatten() function is busy copying the values from the aList reference it has to newList .

So once the recursive call to flatten() returns control to the remaining, still active flatten() function. Once the local newList function has been extended with the returned values, the function then moves to the next element, [[[3]], 'dog', 4, 5] , which has a few more lists to process, first [[3]] , then [3] and then there are no more nested lists to process.

If you write this all out with indentation for new calls, you get:

-> flatten((['b', 'a', 'c', 2], [[[3]], 'dog', 4, 5]))
newList is set to an empty list
item is set to ['b', 'a', 'c', 2]
type(item) is a list, so recurse
- -> flatten(['b', 'a', 'c', 2])
- newList is set to an empty list
- item is set to 'b' , not a list, appended to newList , now ['b']
- item is set to 'a' , not a list, appended to newList , now ['b', 'a']
- item is set to 'c' , not a list, appended to newList , now ['b', 'a', 'c']
- item is set to 2 , not a list, appended to newList , now ['b', 'a', 'c', 2]
- loop is done, return newList
- <- ['b', 'a', 'c', 2]
newList is extended with ['b', 'a', 'c', 2] , so now ['b', 'a', 'c', 2]
item is set to [[[3]], 'dog', 4, 5]
type(item) is a list, so recurse
- -> flatten([[[3]], 'dog', 4, 5])
- newList is set to an empty list
- item is set to [[3]]
- type(item) is a list, so recurse
  - -> flatten([[3]])
  - newList is set to an empty list
  - item is set to [3]
  - type(item) is a list, so recurse
    - -> flatten([3])
    - newList is set to an empty list
    - item is set to 3
    - type(item) is a list, so recurse
      - -> flatten([3])
      - item is set to 3 , not a list, appended to newList , now [3]
      - loop is done, return newList
      - <- [3]
    - newList is extended with [3] , so now [3]
    - loop is done, return newList
    - <- [3]
  - newList is extended with [3] , so now [3]
  - loop is done, return newList
  - <- [3]
- newList is extended with [3] , so now [3]
- item is set to 'dog' , not a list, appended to newList , now [3, 'dog']
- item is set to 4 , not a list, appended to newList , now [3, 'dog', 4]
- item is set to 5 , not a list, appended to newList , now [3, 'dog', 4, 5]
- loop is done, return newList
- <- [3, 'dog', 4, 5]
newList is extended with [3, 'dog', 4, 5] , so now ['b', 'a', 'c', 2, 3, 'dog', 4, 5]
<- ['b', 'a', 'c', 2, 3, 'dog', 4, 5]

In the Pythontutor visualisation (the default visualisation that Pythontutor uses for Python code), the fact that you see "b" twice is actually an artifact of the simplification that Pythontutor uses. While lists and tuples are shown as separate objects with arrows showing how they are referenced, 'primitive' types such as strings and integers are shown inside the lists or directly inside variables in function frames.

In reality, these objects too are separate, and they too live on the heap and are referenced. That "b" value exists as a single object, with multiple lists referencing it. You can pick a different visualisation, however:

With that option, the visualisation becomes a lot larger:

Here you can see that both newList in the active function frame and the original list object referenced from the input tuple reference a single str object with value "b" . But you can perhaps see that with this level of detail things are a bit too verbose to take in in one go.

Answer 2

Perhaps it would be easier to understand if it were written simpler:

aList = ['b','a','c',2],[[[3]],'dog',4,5]  
def flatten(value):
    if not isinstance(value,(list,tuple)) : return [value]
    return [ item for subItem in value for item in flatten(subItem) ]

If the value parameter is a list or tuple, each element is concatenated to form the flattened output (2nd line). Because each of these elements could itself be a list or tuple, the function calls itself to flattent the item out before concatenating to the others. The function will stop calling itself when its parameter is a scalar value (ie not a list or tuple). In that case it will return the value itself as a single element list (1st line) because it cannot be further flattened and its caller (itself) expects a list.

flatten( aList ) : returns ['b']+['a']+['c']+[2]+[3]+['dog']+[4]+[5]

   --> flatten( ['b','a','c',2] ) : returns ['b']+['a']+['c']+[2]
         --> flatten('b') : returns ['b']     
         --> flatten('a') : returns ['a']     
         --> flatten('c') : returns ['c']     
         --> flatten(2)   : returns [2] 

   --> flatten( [[[3]],'dog',4,5] ): returns [3]+['dog']+[4]+[5]
         --> flatten([[3]]) : returns [3]
               --> flatten([3]) : returns [3]
                     --> flatten(3) : returns [3]
         --> flatten('dog') : returns ['dog']
         --> flatten(4)     : returns [4]
         --> flatten(5)     : returns [5]

I do not understand how this flatten with extend and append works for Python 3.6

Question

2 answers

solution1
3 2018-10-03 22:53:57

solution2
0 2019-06-09 18:21:38

I do not understand how this flatten with extend and append works for Python 3.6

Question

2 answers

solution1 3 2018-10-03 22:53:57

solution2 0 2019-06-09 18:21:38

solution1
3 2018-10-03 22:53:57

solution2
0 2019-06-09 18:21:38