简体   繁体   English

Matlab中单元格数组的比较

[英]Comparision of cell arrays in matlab

I have two cell arrays storing the unigram and bigrams each, which i have extracted from the text file. 我有两个单元格数组,每个单元格数组都存储了从字母文件中提取出来的unigram和bigrams。 Now i have to compare each unigram with the bigram to find the count and later probablity of unigram present in the bigram. 现在,我必须将每个字母组合与二元组进行比较,以找到存在于二元组中的字母组合的计数和随后的概率。 Can any one please help me how to sort that problem,i have already used strcmp but its not working. 任何人都可以帮助我如何解决该问题,我已经使用过strcmp但它不起作用。 I am writing my code below: 我在下面编写我的代码:

for i = 1
    for j = 1:bigramRow
       bigram1 = regexp(splitBigramCellsA{j},'<s>|\w*|</s>','match');
       b1 = cellfun(@(x,y)[x], bigram1(1:end-1)','un',0)
       match = strcmp(splitUnigramCellsA, splitBigramCellsA{j,1});

        if match ==1
           bigram1count = splitbigramCellsB{j};
            unigram1count = splitUnigramCellsB{j};
            disp(bigram1count)
            disp(unigram1count)
        end
 end
end

If you can fit the text in memory, you could do the following: 如果您可以将文本放入内存中,则可以执行以下操作:

  1. create a cell array of all words (in order) 创建所有单词的单元格数组(按顺序)
  2. call unique on the cell array, and capture the third output as well. 在单元格数组上调用唯一,并捕获第三个输出。 That is the original text represented as an array of indices, where each index refers to a unigram. 那是表示为索引数组的原始文本,其中每个索引都指向一个字母组合。
  3. create all bigrams as bigrams = [indices(1:2:largestEven),indices(2:2:largestEven);indices(2:2:largestOdd),indices(3:2:largestOdd)] , where largestEven is 2*floor(length(indices)/2) , and largestOdd is 2*floor((length(indices)+1)/2)+1 . 将所有bigrams创建为bigrams = [indices(1:2:largestEven),indices(2:2:largestEven);indices(2:2:largestOdd),indices(3:2:largestOdd)] ,其中largestEven2*floor(length(indices)/2) ,并且largestOdd2*floor((length(indices)+1)/2)+1
  4. calculate eg the frequency of each unigram in the bigrams as tabulate(bigrams(:)) 例如,计算二元组中每个字母组合的频率为tabulate(bigrams(:))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM