简体   繁体   中英

Finding the diff of two lists of strings

I'm trying to find a diff (longest common subsequences) between two lists of strings. I'm guessing difflib could be useful here, but difflib.ndiff annotates the output with - , + , etc. For instance

from difflib import ndiff
t1 = 'one 1\ntwo 2\nthree 3'.splitlines()
t2 = 'one 1\ntwo 29\nthree 3'.splitlines()
d = list(ndiff(t1, t2    )); print d;

['  one 1', '- two 2', '+ two 29', '?      +\n', '  three 3']

Is tokenising and removing the letter-codes in the output the right way? Is this the proper Pythonic way of diffing lists?

If all you want is the difference of first list from second, you can convert them to set and take set difference using - operator.

Example -

>>> l1 = [1,2,3,4,5]
>>> l2 = [4,5,6,7,8]
>>> print(list(set(l1) - set(l2)))
[1, 2, 3]

By List comprehension:

In [16]: l1 = ['a', 'b', 'c', 'd']

In [17]: l2 = ['a', 'x', 'y', 'c']

In [18]: l1_l2 = [ii for ii in l1 if ii not in l2]

In [19]: l1_l2
Out[19]: ['b', 'd']

In [20]: l2_l1 = [ii for ii in l2 if ii not in l1]

In [21]: l2_l1 
Out[21]: ['x', 'y']

In [22]: 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM