I have an array called reducedWords
(nx1) and this array contains the words of my document. I need to find high frequency word, my question is: Is there any function that i can use that? Or should I define my function?
reducedWords = allWords;
unneccesaryWords = {'in','on','at','from','with','a','as','if','of',...
'that','and','the','or','else','to','an'};
kk = 1;
while kk <= length(reducedWords)
for cc = 1:length(unneccesaryWords)
if strcmp(reducedWords{kk},unneccesaryWords{cc})==1
reducedWords = { reducedWords{1:kk-1} reducedWords{kk+1:end} };
kk = 1;
end
end
kk = kk + 1;
end
Best regards
You can use tabulate()
, which creates a frequency table of data in the vector.
Example:
words = {'a','a','bb','bb','bb','bb','ccc'};
tab = tabulate(words)
Result:
Value Count Percent
a 2 28.57%
bb 4 57.14%
ccc 1 14.29%
Alternatively, you can use CountMember.m
.
Approach 1
Code
words_cell_array = {'cat' 'goat' 'man' 'woman' 'child' 'man'}
[array1, ~, ind1] = unique(words_cell_array,'stable');
[~,max_ind] = max(histc(ind1, 1:numel(array1)));
max_occuring_word = words_cell_array(max_ind)
Output
words_cell_array =
'cat' 'goat' 'man' 'woman' 'child' 'man'
max_occuring_word =
'man'
Approach 2
Code
words_cell_array = {'cat' 'goat' 'man' 'woman' 'child' 'man'}
[~, ~, ind1] = unique(words_cell_array,'stable');
[~,max_ind] = max(sum(bsxfun(@eq,ind1,ind1'),1));%%//'
max_occuring_word = words_cell_array(max_ind)
Approach 3: If you are looking for some stats about the cell array of words
Code
words_cell_array = {'man' 'goat' 'man' 'woman' 'goat' 'man'};
[Words, v1, ind1] = unique(words_cell_array,'stable');
Count = histc(ind1, 1:numel(Words));
Percent = Count*100/numel(words_cell_array);
Output
words_cell_array =
'man' 'goat' 'man' 'woman' 'goat' 'man'
Words =
'man' 'goat' 'woman'
Count =
3 2 1
Percent =
50.0000 33.3333 16.6667
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.