I have two lists:
nodes = [[nodeID1, x1, y1, z1],[nodeID2, x2, y2, z2],...,[nodeIDn, xn, yn, zn]]
and
subsetA_nodeID = [[nodeIDa], [nodeIDb], ....]]
I'd like to compare these two lists and return a new list with nodeIDs, x, y, z
of nodes
that do not match the nodeIDs
of subsetA_nodeID
.
I could do it like:
new_list = []
for line in nodes:
for nodeID,x,y,z in line:
for line2 in subsetA_nodeID:
if line2[0] == nodeID:
else:
new_list.append([line])
This code is totally inefficient. I'm looking for a fast way to do this. I've tried dictionaries but I couldn't see a way to use them correctly. Any ideas?
Thanks!
I'd suggest to first flatten subsetA_nodeID
.
ssa_flat = [x for sublist in subsetA_nodeID for x in sublist]
Or, if each sublist in in subsetA_nodeID
is guaranteed to only contain one element:
ssa_flat = [x[0] for x in subsetA_nodeID]
If the nodes are hashable consider making ssa_flat
a set
.
ssa_flat = set(ssa_flat)
Then you can create your new list like this:
lst = [x[0] for x in nodes if x[0] not in ssa_flat]
Edit: If lst
should contain the [NodeID, x, y, z]
lists, simply change the first x[0]
to x
in the last list comprehension.
numpy is your friend for stuff like this ...
import itertools,numpy
a = numpy.array(nodes)
list_of_ids = itertools.chain(*subsetA_nodeID) # flatten
mask = ~numpy.in1d(a[:,1],list_of_ids) # intersection negated
print a[mask] # show the rows that match this condition
I also suggest making list_of_ids
a set since set lookup are much faster (numpy may already do this under the hood ... not sure)
You could try using a list comprehension to look through them all:
new_list = [node for node in nodes if node[0] not in subsetA_nodeID]
although I am not sure how efficient this is compared with other answers presented. As stated in another answer, you may need to flatten your subsetA_nodeID
into a 1-D list for this to work.
Iteration though the entire thing is probably not a good idea for large problems, besides @JoranBeasley's suggestion, pandas
is also an alternative:
In [52]:
import pandas as pd
nodes = [['nodeID1', 'x1', 'y1', 'z1'],['nodeID2', 'x2', 'y2', 'z2'],['nodeIDn', 'xn', 'yn', 'zn']]
subsetA_nodeID = [['nodeID1'], ['nodeID2']]
subsetA_nodeIDa = ['nodeID1', 'nodeID2'] #use itertools.chain to get this
In [53]:
df=pd.DataFrame(nodes)
print df
df.set_index(0, inplace=True)
print df
0 1 2 3
0 nodeID1 x1 y1 z1
1 nodeID2 x2 y2 z2
2 nodeIDn xn yn zn
1 2 3
0
nodeID1 x1 y1 z1
nodeID2 x2 y2 z2
nodeIDn xn yn zn
In [54]:
print df.ix[subsetA_nodeIDa]
1 2 3
nodeID1 x1 y1 z1
nodeID2 x2 y2 z2
In [55]:
list(map(list, df.ix[subsetA_nodeIDa].values))
Out[55]:
[['x1', 'y1', 'z1'], ['x2', 'y2', 'z2']]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.