I've been asked this in a technical interview which I would like to know if my answer is totally wrong or not.
The interviewer asked me to diff two lists. Here's the example
[1, 2, 3, 4], [1, 2, 3] => [4]
[1, 2, 2, 2], [1, 2] => [2, 2]
[1, 2, 2, 2], [1, 2, 3, 3] => [2, 2]
def diff_two_list(list1, list2):
hash_map1 = {}
for i in list1:
hash_map1[i] = hash_map1.get(i, 0) + 1
hash_map2 = {}
for j in list2:
hash_map2[j] = hash_map2.get(j, 0) + 1
result = []
for i in hash_map1.keys():
if i not in hash_map2:
for _ in range(hash_map1[i]):
result.append(i)
else:
remained_value = hash_map1[i] - hash_map2[i]
if remained_value > 0:
for _ in range(remained_value):
result.append(i)
return result
I realized that this is not the best code. I was wondering if my solution is totally wrong or not? And what's the time complexity of this solution. I was thinking of asking this in codereview.stackexchange.com but they said the code has to be correct to ask for a review so I'm asking in this room instead.
The time complexity I answered was 2o(n)
When you are interviewing for a job the interviewer is looking for many things about the candidate. Some examples:
The interviewer asked you to do the diff between two lists and provided a solution set. I looked at the data and interpreted it as a position-wise comparison of the left array with the right array where:
If you stare at the test cases long enough, you could probably come up with other interpretations.
I would expect the best candidates to ask questions about the data set or explain the assumptions behind the interpretation of the problem and the edge cases.
For a simple problem like this, I would expect the best candidates to write code that you could quickly understand without having to spend significant effort trying to think through what the code is doing.
I would expect the best candidates to solve the problem in a time or space efficient way depending on their assumptions.
In this case I would expect the candidate would create an O(n) solution.
As an interviewer, I would consider your answer difficult to understand and inefficient. With the nesting of the loops, your solution may not be O(n).
I would probably not spend much time trying to figure out the time complexity of your solution or if it would work. I would ask questions to make sure the question was reasonably clear and move on to the next skill or fit question.
I would solve the question as follows:
test_cases = [
[[1, 2, 3, 4], [1, 2, 3], [4]],
[[1, 2, 2, 2], [1, 2], [2, 2]],
[[1, 2, 2, 2], [1, 2, 3, 3], [2, 2]]
]
def left_diff_array(left, right):
smallest_length = min(len(left), len(right))
differences = []
for x in range(1, smallest_length):
if left[x] != right[x]:
differences.append(left[x])
if len(left) > len(right):
differences += left[len(right):]
return differences
for test in test_cases:
first, second, answer = test
assert(left_diff_array(first, second) == answer)
print first, second, "=>", answer
Assuming sorted list you can iterate over each list once which is O(n)
:
def diff_list(a, b):
i, j = iter(b), iter(a)
try:
m, n = next(i), next(j)
while True:
if m == n:
m, n = next(i), next(j)
continue
if m < n:
try:
m = next(i)
except StopIteration:
yield n
raise
else:
yield n
n = next(j)
except StopIteration:
yield from j
>>> list(diff_list([1, 2, 3, 4], [1, 2, 3]))
[4]
>>> list(diff_list([1, 2, 2, 2], [1, 2]))
[2, 2]
>>> list(diff_list([1, 2, 2, 2], [1, 2, 3, 3]))
[2, 2]
I think it is O(m+n)
which is maybe what you meant by 2o(n) / o(2n)
. You have to iterate both lists. A shorter version might clarify that even though it is probably not optimal:
from collections import Counter
def diff_two_list(list1, list2):
c1, c2 = Counter(list1), Counter(list2)
return [y for x in c1 for y in ([x] * (c1[x] - c2[x]))]
The two Counter
calls iterate both lists once. The comprehension of the resulting list is bound by the length of the first list (m) because there cannot be more elements in the diff than in the first list. Both the dict
set item and get item operations inside the loops are O(1)
( Time complexity of python dict ).
This answer only focuses on the algorithm when the lists are sorted.
At every given moment in the program you are looking at a value from each list a
and b
. When a
is less then b
you know that a
cannot be equal to any further values in list_b
since the lists are sorted so a
is added to the difference and move on to the next value in list_a
, same thing applies in reverse when b
is lower.
An implementation of this using iterators could look something like this:
def dif_two_list(list_a, list_b):
diff = []
it_a = iter(list_a)
it_b = iter(list_b)
def next_a():
try:
return next(it_a)
except StopIteration:
#diff.append(b) #uncomment if you want to keep the values in the second list
raise
def next_b():
try:
return next(it_b)
except StopIteration:
diff.append(a)
raise
try:
a = b = None
while True:
if a==b:
#discard both, they are the same
a = next(it_a) #this ended up being the only one that didn't need it's own try except
#if this raises the error we don't want to keep the value of b
b = next_b() #however if this one raises an error we are interested in the 'a' value gotton right above
elif a<b:
#a is not in it_b
diff.append(a)
a = next_a()
else:
#b is not in it_a
#diff.append(b) #uncomment if you are intersted in the values in the second list
b = next_b()
except StopIteration:
#when one is exausted extend the difference by the other
# (and the one just emptied doing nothing, easier then checking which one to extend by)
diff.extend(it_a)
#diff.extend(it_b) #uncomment if you are interested in the values in the second list
return diff
I'm not completely certain how this relates to time complexity but the number of times next
is called is exactly len(list_a) + len(list_b)
so I believe that makes this O(n+m)
I think this snippet should do the trick with a Time Complexity of O(n+m) and Space Complexity of O(n), being n and m the dimension of the respective lists.
I think with a little extra effort someone could improve this code. I'm avoiding the creation of a second dictionary to reduce the memory cost in case the list size gets an issue.
Code readability is a must when you're applying for a programmer job. A good practice is to think out loud while you're analyzing the technical problem. The interviewer may give you a hint about what is he expecting.
I don't know if we are allowed to do so, but I think it may help:
Work at Google — Example Coding/Engineering Interview
Interview Cheat Sheet - Andrei Neagoie's - Data Structures + Algorithms
def diff_two_lists(list1, list2):
dictionary = {}
for item in list1:
if not item in dictionary:
dictionary[item] = 1
else:
dictionary[item] += 1
for item in list2:
if item in dictionary:
dictionary[item] -= 1
diff = []
for key, value in dictionary.items():
for i in range(value):
diff.append(key)
return diff
print(diff_two_lists([1, 2, 3, 4], [1, 2, 3]))
print(diff_two_lists([1, 2, 2, 2], [1, 2]))
print(diff_two_lists([1, 2, 2, 2], [1, 2, 3, 3]))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.