简体   繁体   中英

In the Python for loop, I can't get the expected counter value

A speaker recognition project using the assemblyzer module. The folder structure is as follows: there is a directory speakers , which contains folders spk1 , spk2 , etc. Each folder contains several recordings of individual speakers (one folder - one speaker). During processing, I made a dictionary in which the keys are the same directories spk1 , spk2 , etc., and their values are embedding (representations) of speaker recordings.

Next, I want to compare the recordings of each speaker in pairs to calculate the accuracy metric (how often the system makes mistakes). At this first stage of the script below, I do the following: I create pairwise combinations in order to sort through all possible combinations of records of any speaker inside "his" folder.

The second stage is to write the embeds of records into similarity matrices and compare them by means of cosine similarity.

In the final ( third stage ) we consider accuracy. We see: we went through 46 combinations, but for some reason we got 0 matches. Although if you print out the similarity matrices, it is obvious that there are coincidences. What's wrong with the for loop?

Previously, a similar problem occurred when I solved the same problem using the speechbrain library. Then the counting error was associated with a tensor data type that generates logical responses True or False . Here, as it seems to me, is a different case.

Code:

!pip install resemblyzer
! pip install umap
import numpy as np
from itertools import combinations

num_true=0
num_total=0

# Stage 1 - for the sake of comparison, we sort through the dictionary values (i.e. embeddings of speakers' records) and create a list of all possible combinations:
# (speaker 1 record 1 - speaker 1 record 2), (speaker 1 record 1 - speaker 1 record 3), etc.
for elems in speaker_wavs.values():   
  # print(elems[0])
  tuples = list(combinations(elems, 2)) # we get a search of all combinations
 
  # Stage 2 - create embeddings of records
  for single in tuples:         # we go through each combination in the list
      # the .embed_utterance() function creates voice embeddings
      embeds = (np.array( [encoder.embed_utterance(single[0]) ] ), np.array([encoder.embed_utterance(single[1]) ] ) )
      num_total+=1

      # Let's calculate the similarity matrix. The similarity of two embeddings is simply their dot product, 
      # because the similarity metric is cosine similarity, and embeddings are already normalized by L2.

      # Short version:
      utt_sim_matrix = np.inner(embeds[0], embeds[1])    # The inner product of two arrays 
      # print('Matrix_1', utt_sim_matrix) # print it out if you need to visually compare embeddings
    
      # Long, detailed version:
      utt_sim_matrix2 = np.zeros( (len(embeds[0]), len(embeds[1]) ) )
      for i in range(len(embeds[0])):
        for j in range(len(embeds[1])):
        # The @ notation is equivalent to np.dot(embedds_a[i], embedds_b[i])
          utt_sim_matrix2[i, j] = embeds[0][i] @ embeds[1][j]
          # print('Matrix_2', utt_sim_matrix2)  # print it out if you need to visually compare embeddings

          # Returns True if two arrays are equal in elements within the tolerance       
          if np.allclose(utt_sim_matrix, utt_sim_matrix2) == 'True':
            num_true+=1

# print(num_true)   # now we get 0
# print(num_total)  # now we get 46

# Stage 3 - counting the accuracy metric:
if num_total !=0:
  accuracy = num_true/num_total
  print(accuracy)
else:
  print('You can't divide by zero')

Solved the problem like this:

if str(np.allclose(utt_sim_matrix, utt_sim_matrix2)) == 'True':
            num_true+=1

Thus, I explicitly converted the response to a string type

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM