Python diff two lists what's the time complexity?

Question

I've been asked this in a technical interview which I would like to know if my answer is totally wrong or not.

The interviewer asked me to diff two lists. Here's the example

[1, 2, 3, 4], [1, 2, 3] => [4]
[1, 2, 2, 2], [1, 2] => [2, 2]
[1, 2, 2, 2], [1, 2, 3, 3] => [2, 2]


def diff_two_list(list1, list2):
  hash_map1 = {}
  for i in list1:
    hash_map1[i] = hash_map1.get(i, 0) + 1

  hash_map2 = {}
  for j in list2:
    hash_map2[j] = hash_map2.get(j, 0) + 1

  result = []
  for i in hash_map1.keys():
    if i not in hash_map2:
      for _ in range(hash_map1[i]):
        result.append(i)
    else:
      remained_value = hash_map1[i] - hash_map2[i]
      if remained_value > 0:
        for _ in range(remained_value):
          result.append(i)

  return result

I realized that this is not the best code. I was wondering if my solution is totally wrong or not? And what's the time complexity of this solution. I was thinking of asking this in codereview.stackexchange.com but they said the code has to be correct to ask for a review so I'm asking in this room instead.

The time complexity I answered was 2o(n)

Answer 1

When you are interviewing for a job the interviewer is looking for many things about the candidate. Some examples:

Does the candidate ask the right questions about the problem?
Is the candidate experienced in a skill that I value?
Can the candidate think through a problem and come up with a reasonable answer?
Does the candidate write clean code that will be maintainable within my team?

The interviewer asked you to do the diff between two lists and provided a solution set. I looked at the data and interpreted it as a position-wise comparison of the left array with the right array where:

The resulting list would include the values on the left when they are different
The resulting list would include the values on the left when the position on the right is empty

If you stare at the test cases long enough, you could probably come up with other interpretations.

I would expect the best candidates to ask questions about the data set or explain the assumptions behind the interpretation of the problem and the edge cases.

These example list appear sorted, is that a coincidence
This seems to be a position-wise difference. Am I understanding that right?
Is the right side always either the same size or smaller?
If the right side is bigger, should I be including an element?
What is the maximum size of the data set?

For a simple problem like this, I would expect the best candidates to write code that you could quickly understand without having to spend significant effort trying to think through what the code is doing.

I would expect the best candidates to solve the problem in a time or space efficient way depending on their assumptions.

In this case I would expect the candidate would create an O(n) solution.

As an interviewer, I would consider your answer difficult to understand and inefficient. With the nesting of the loops, your solution may not be O(n).

I would probably not spend much time trying to figure out the time complexity of your solution or if it would work. I would ask questions to make sure the question was reasonably clear and move on to the next skill or fit question.

I would solve the question as follows:

test_cases = [
    [[1, 2, 3, 4], [1, 2, 3], [4]],
    [[1, 2, 2, 2], [1, 2], [2, 2]],
    [[1, 2, 2, 2], [1, 2, 3, 3], [2, 2]]
]


def left_diff_array(left, right):
    smallest_length = min(len(left), len(right))
    differences = []

    for x in range(1, smallest_length):
        if left[x] != right[x]:
            differences.append(left[x])

    if len(left) > len(right):
        differences += left[len(right):]

    return differences


for test in test_cases:
    first, second, answer = test

    assert(left_diff_array(first, second) == answer)
    print first, second, "=>", answer

Answer 2

Assuming sorted list you can iterate over each list once which is O(n) :

def diff_list(a, b):
    i, j = iter(b), iter(a)
    try:
        m, n = next(i), next(j)
        while True:
            if m == n:
                m, n = next(i), next(j)            
                continue
            if m < n:
                try:
                    m = next(i)
                except StopIteration:
                    yield n
                    raise
            else:
                yield n
                n = next(j)
    except StopIteration:
        yield from j

>>> list(diff_list([1, 2, 3, 4], [1, 2, 3]))
[4]
>>> list(diff_list([1, 2, 2, 2], [1, 2]))
[2, 2]
>>> list(diff_list([1, 2, 2, 2], [1, 2, 3, 3]))
[2, 2]

Answer 3

I think it is O(m+n) which is maybe what you meant by 2o(n) / o(2n) . You have to iterate both lists. A shorter version might clarify that even though it is probably not optimal:

from collections import Counter

def diff_two_list(list1, list2):
    c1, c2 = Counter(list1), Counter(list2)
    return [y for x in c1 for y in ([x] * (c1[x] - c2[x]))]

The two Counter calls iterate both lists once. The comprehension of the resulting list is bound by the length of the first list (m) because there cannot be more elements in the diff than in the first list. Both the dict set item and get item operations inside the loops are O(1) ( Time complexity of python dict ).

Answer 4

This answer only focuses on the algorithm when the lists are sorted.

At every given moment in the program you are looking at a value from each list a and b . When a is less then b you know that a cannot be equal to any further values in list_b since the lists are sorted so a is added to the difference and move on to the next value in list_a , same thing applies in reverse when b is lower.

An implementation of this using iterators could look something like this:

def dif_two_list(list_a, list_b):
    diff = []
    it_a = iter(list_a)
    it_b = iter(list_b)
    def next_a():
        try:
            return next(it_a)
        except StopIteration:
            #diff.append(b) #uncomment if you want to keep the values in the second list
            raise
    def next_b():
        try:
            return next(it_b)
        except StopIteration:
            diff.append(a)
            raise
    try:
        a = b = None
        while True:
            if a==b:
                #discard both, they are the same
                a = next(it_a) #this ended up being the only one that didn't need it's own try except
                               #if this raises the error we don't want to keep the value of b
                b = next_b() #however if this one raises an error we are interested in the 'a' value gotton right above
            elif a<b:
                #a is not in it_b
                diff.append(a)
                a = next_a()
            else:
                #b is not in it_a
                #diff.append(b) #uncomment if you are intersted in the values in the second list
                b = next_b()
    except StopIteration:
        #when one is exausted extend the difference by the other
        # (and the one just emptied doing nothing, easier then checking which one to extend by)
        diff.extend(it_a)
        #diff.extend(it_b) #uncomment if you are interested in the values in the second list
    return diff

I'm not completely certain how this relates to time complexity but the number of times next is called is exactly len(list_a) + len(list_b) so I believe that makes this O(n+m)

Answer 5

I think this snippet should do the trick with a Time Complexity of O(n+m) and Space Complexity of O(n), being n and m the dimension of the respective lists.

I think with a little extra effort someone could improve this code. I'm avoiding the creation of a second dictionary to reduce the memory cost in case the list size gets an issue.

Code readability is a must when you're applying for a programmer job. A good practice is to think out loud while you're analyzing the technical problem. The interviewer may give you a hint about what is he expecting.

I don't know if we are allowed to do so, but I think it may help:

Work at Google — Example Coding/Engineering Interview

Interview Cheat Sheet - Andrei Neagoie's - Data Structures + Algorithms

def diff_two_lists(list1, list2):
    dictionary = {}

    for item in list1:
        if not item in dictionary:
            dictionary[item] = 1
        else:
            dictionary[item] += 1

    for item in list2:
        if item in dictionary:
            dictionary[item] -= 1

    diff = []
    for key, value in dictionary.items():
        for i in range(value):
            diff.append(key)
    return diff


print(diff_two_lists([1, 2, 3, 4], [1, 2, 3]))
print(diff_two_lists([1, 2, 2, 2], [1, 2]))
print(diff_two_lists([1, 2, 2, 2], [1, 2, 3, 3]))

Python diff two lists what's the time complexity?

Question

5 answers

solution1
5 ACCPTED 2016-06-03 04:55:19

solution2
1 2016-06-03 04:29:48

solution3
0 2016-06-03 03:51:58

solution4
0 2016-06-03 04:17:57

solution5
0 2020-05-17 07:17:47

Python diff two lists what's the time complexity?

Question

5 answers

solution1 5 ACCPTED 2016-06-03 04:55:19

solution2 1 2016-06-03 04:29:48

solution3 0 2016-06-03 03:51:58

solution4 0 2016-06-03 04:17:57

solution5 0 2020-05-17 07:17:47

solution1
5 ACCPTED 2016-06-03 04:55:19

solution2
1 2016-06-03 04:29:48

solution3
0 2016-06-03 03:51:58

solution4
0 2016-06-03 04:17:57

solution5
0 2020-05-17 07:17:47