I'm trying to find a diff (longest common subsequences) between two lists of strings. I'm guessing difflib
could be useful here, but difflib.ndiff
annotates the output with -
, +
, etc. For instance
from difflib import ndiff
t1 = 'one 1\ntwo 2\nthree 3'.splitlines()
t2 = 'one 1\ntwo 29\nthree 3'.splitlines()
d = list(ndiff(t1, t2 )); print d;
[' one 1', '- two 2', '+ two 29', '? +\n', ' three 3']
Is tokenising and removing the letter-codes in the output the right way? Is this the proper Pythonic way of diffing lists?
If all you want is the difference of first list from second, you can convert them to set
and take set difference using -
operator.
Example -
>>> l1 = [1,2,3,4,5]
>>> l2 = [4,5,6,7,8]
>>> print(list(set(l1) - set(l2)))
[1, 2, 3]
By List comprehension:
In [16]: l1 = ['a', 'b', 'c', 'd']
In [17]: l2 = ['a', 'x', 'y', 'c']
In [18]: l1_l2 = [ii for ii in l1 if ii not in l2]
In [19]: l1_l2
Out[19]: ['b', 'd']
In [20]: l2_l1 = [ii for ii in l2 if ii not in l1]
In [21]: l2_l1
Out[21]: ['x', 'y']
In [22]:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.