[英]How to count how many times each individual word appears in text file
我有,我需要找到每個單詞有多少次出現在該文本文件的一個挑戰。 您可以忽略標點和大寫。
我的目標是:
到目前為止,這就是我的頭腦。 我顯然不是在可以編寫代碼以按單詞長度對單詞進行分組的時候。
但是,我發現了該代碼,相信會對我有幫助,但我無法理解它,我認為這是我想要的。 有人可以引導我通過它嗎?
這是我正在尋找的工作示例: http : //textuploader.com/dq68g
CountUniqueWords.prototype.countWords = function(line) {
var self = this;
var uniqueWords = self._uniqueWords || {};
var words = line.match(/(?!\d)(\w+\b)/g, '');
var word;
var i;
for (i = 0; words ? i < words.length : 0; i++) {
word = words[i].toLowerCase();
uniqueWords[word] = uniqueWords[word] ?
uniqueWords[word] += 1 : 1;
}
return uniqueWords;
};
這樣可以:
fileContent
// lowercase
.toLowerCase()
// remove non-words
.replace(/\W/g, " ")
// split by space, tab and newline
.split(/\s+/)
// remove empty entries
.filter(v => !!v)
// count all terms
.reduce((dict, v) => {dict[v] = v in dict ? dict[v] + 1 : 1; return dict}, {});
var content = `"The quick brown fox jumps over the lazy dog" is an English-language pangram—a sentence that contains all of the letters of the alphabet. It is commonly used for touch-typing practice, testing typewriters and computer keyboards, displaying examples of fonts, and other applications involving text where the use of all letters in the alphabet is desired. Owing to its brevity and coherence, it has become widely known.`; console.log(get_terms(content)); function get_terms(corpus){ return corpus .toLowerCase() .replace(/\\W/g, " ") .split(/\\s+/) .filter(v => !!v) .reduce((dict, v) => {dict[v] = v in dict ? dict[v] + 1 : 1; return dict}, {}); }
不幸的是,ES不支持任何有序詞典。 為此,您可能必須實現自己的數據結構。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.