简体   繁体   中英

compare two lists and return not matching items

I have two lists:

nodes = [[nodeID1, x1, y1, z1],[nodeID2, x2, y2, z2],...,[nodeIDn, xn, yn, zn]]

and

subsetA_nodeID = [[nodeIDa], [nodeIDb], ....]]

I'd like to compare these two lists and return a new list with nodeIDs, x, y, z of nodes that do not match the nodeIDs of subsetA_nodeID .

I could do it like:

new_list = []
for line in nodes:
   for nodeID,x,y,z in line:
      for line2 in subsetA_nodeID:
         if line2[0] == nodeID:
         else:
            new_list.append([line])

This code is totally inefficient. I'm looking for a fast way to do this. I've tried dictionaries but I couldn't see a way to use them correctly. Any ideas?

Thanks!

I'd suggest to first flatten subsetA_nodeID .

ssa_flat = [x for sublist in subsetA_nodeID for x in sublist] 

Or, if each sublist in in subsetA_nodeID is guaranteed to only contain one element:

ssa_flat = [x[0] for x in subsetA_nodeID]

If the nodes are hashable consider making ssa_flat a set .

ssa_flat = set(ssa_flat)

Then you can create your new list like this:

lst = [x[0] for x in nodes if x[0] not in ssa_flat]

Edit: If lst should contain the [NodeID, x, y, z] lists, simply change the first x[0] to x in the last list comprehension.

numpy is your friend for stuff like this ...

import itertools,numpy

a = numpy.array(nodes)
list_of_ids = itertools.chain(*subsetA_nodeID) # flatten
mask = ~numpy.in1d(a[:,1],list_of_ids) # intersection negated
print a[mask] # show the rows that match this condition

I also suggest making list_of_ids a set since set lookup are much faster (numpy may already do this under the hood ... not sure)

You could try using a list comprehension to look through them all:

new_list = [node for node in nodes if node[0] not in subsetA_nodeID]

although I am not sure how efficient this is compared with other answers presented. As stated in another answer, you may need to flatten your subsetA_nodeID into a 1-D list for this to work.

Iteration though the entire thing is probably not a good idea for large problems, besides @JoranBeasley's suggestion, pandas is also an alternative:

In [52]:
import pandas as pd
nodes = [['nodeID1', 'x1', 'y1', 'z1'],['nodeID2', 'x2', 'y2', 'z2'],['nodeIDn', 'xn', 'yn', 'zn']]
subsetA_nodeID = [['nodeID1'], ['nodeID2']]
subsetA_nodeIDa = ['nodeID1', 'nodeID2'] #use itertools.chain to get this
In [53]:

df=pd.DataFrame(nodes)
print df
df.set_index(0, inplace=True)
print df
         0   1   2   3
0  nodeID1  x1  y1  z1
1  nodeID2  x2  y2  z2
2  nodeIDn  xn  yn  zn
          1   2   3
0                  
nodeID1  x1  y1  z1
nodeID2  x2  y2  z2
nodeIDn  xn  yn  zn
In [54]:

print df.ix[subsetA_nodeIDa]
          1   2   3
nodeID1  x1  y1  z1
nodeID2  x2  y2  z2
In [55]:

list(map(list, df.ix[subsetA_nodeIDa].values))
Out[55]:
[['x1', 'y1', 'z1'], ['x2', 'y2', 'z2']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM