I need to compute the number of differences (~score) of all rows against all the other of a full 2d-array (score needed to compute a 'difference distance' of an array usefull for statistics). Here a simple exemple, but i need to do that on huge 2d-arrays of ~100 000 rows and thousands of rows, so I'm looking for speeding up my naive code:
a = numpy.array([[1,2],[1,2],[1,3],[2,3],[3,3]])
score =0
scoresquare = 0
for i in xrange(len(a)):
for j in range(i+1,len(a)):
scoretemp = 0
if a[i,0]!=a[j,0] and a[i,1]!=a[j,0] and a[i,1]!=a[j,1] and a[i,0]!=a[j,1] :
# comparison gives two different items
scoretemp = 2
elif (a[i]==a[j]).all():
scoretemp = 0
else:
scoretemp=1
print a[i],a[j],scoretemp, (a[i]==a[j]).all(),(a[i]==a[j]).any()
score += scoretemp
scoresquare += (scoretemp*scoretemp)
print score,scoresquare
a[0] is identical to a[1] so score(number of differences)=0, but has one difference with a[2] and two differences with a[3]. To compute such distance (statistics), I need intermedairy square-score and score.
reference_row compared_row score
[1 2] [1 2] 0
[1 2] [1 3] 1
[1 2] [2 3] 1
[1 2] [3 3] 2
[1 2] [1 3] 1
[1 2] [2 3] 1
[1 2] [3 3] 2
[1 3] [2 3] 1
[1 3] [3 3] 1
[2 3] [3 3] 1
Sum_score=11 Sum_scoresquare=15
My code is quite naive and doesn't take advantage of the full strenght of arrays so: How to accelerate such computation? Thanks for your help
np.in1d
searches every element of array1 in array2 and generates True for a match. So we need to negate the result using ~np.in1d
. After that np.where
gives those indices which hold a True value, so len(np.where(...))
gives the total mismatches. I hope this will help you:
>>> import numpy as np
>>> a = np.array([[1,2],[1,2],[1,3],[2,3],[3,3]])
>>> res=[len(np.where(~np.in1d(a[p],a[q]))[0]) for p in range(a.shape[0]) for q in range(p+1,a.shape[0])]
>>> res=np.array(res)
>>> Sum_score=sum(res)
>>> Sum_score_square=sum(res*res)
>>> print Sum_score, Sum_score_square
11 15
>>> k=0
>>> for i in range(a.shape[0]):
... for j in range(i+1,a.shape[0]):
... print a[i],a[j],res[k]
... k+=1
[1 2] [1 2] 0
[1 2] [1 3] 1
[1 2] [2 3] 1
[1 2] [3 3] 2
[1 2] [1 3] 1
[1 2] [2 3] 1
[1 2] [3 3] 2
[1 3] [2 3] 1
[1 3] [3 3] 1
[2 3] [3 3] 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.