Matlab中单元格数组的比较

Question

I have two cell arrays storing the unigram and bigrams each, which i have extracted from the text file. 我有两个单元格数组，每个单元格数组都存储了从字母文件中提取出来的unigram和bigrams。 Now i have to compare each unigram with the bigram to find the count and later probablity of unigram present in the bigram. 现在，我必须将每个字母组合与二元组进行比较，以找到存在于二元组中的字母组合的计数和随后的概率。 Can any one please help me how to sort that problem,i have already used strcmp but its not working. 任何人都可以帮助我如何解决该问题，我已经使用过strcmp但它不起作用。 I am writing my code below: 我在下面编写我的代码：

for i = 1
    for j = 1:bigramRow
       bigram1 = regexp(splitBigramCellsA{j},'<s>|\w*|</s>','match');
       b1 = cellfun(@(x,y)[x], bigram1(1:end-1)','un',0)
       match = strcmp(splitUnigramCellsA, splitBigramCellsA{j,1});

        if match ==1
           bigram1count = splitbigramCellsB{j};
            unigram1count = splitUnigramCellsB{j};
            disp(bigram1count)
            disp(unigram1count)
        end
 end
end

Answer 1

If you can fit the text in memory, you could do the following: 如果您可以将文本放入内存中，则可以执行以下操作：

create a cell array of all words (in order) 创建所有单词的单元格数组（按顺序）
call unique on the cell array, and capture the third output as well. 在单元格数组上调用唯一，并捕获第三个输出。 That is the original text represented as an array of indices, where each index refers to a unigram. 那是表示为索引数组的原始文本，其中每个索引都指向一个字母组合。
create all bigrams as bigrams = [indices(1:2:largestEven),indices(2:2:largestEven);indices(2:2:largestOdd),indices(3:2:largestOdd)] , where largestEven is 2*floor(length(indices)/2) , and largestOdd is 2*floor((length(indices)+1)/2)+1 . 将所有bigrams创建为bigrams = [indices(1:2:largestEven),indices(2:2:largestEven);indices(2:2:largestOdd),indices(3:2:largestOdd)] ，其中largestEven为2*floor(length(indices)/2) ，并且largestOdd是2*floor((length(indices)+1)/2)+1 。
calculate eg the frequency of each unigram in the bigrams as tabulate(bigrams(:)) 例如，计算二元组中每个字母组合的频率为tabulate(bigrams(:))

Matlab中单元格数组的比较

问题描述

1 个解决方案

解决方案1
0 2016-01-20 13:11:21

Matlab中单元格数组的比较

问题描述

1 个解决方案

解决方案1 0 2016-01-20 13:11:21

解决方案1
0 2016-01-20 13:11:21