簡體   English   中英

如何在.txt文件中查找/定位最常見的單詞並進行更改

[英]How do I find/target the most frequent word in .txt file and change it

我試圖弄清楚如何在文本文件中找到一個最常用的單詞並更改該單詞,以便將其包裹在其他內容中,例如:freewordchoice(免費+常用單詞+選擇)以及文本中的所有位置該詞是可以更改的文本。 我一直為這樣的事情而生氣,但我找不到它。 我是Java的新手,這是我想使用的。 要上傳和顯示文本效果很好,我不了解的是我如何定位最常用的單詞並在整個文本中更改它,然后才能在瀏覽器中實際顯示它。 在我的腦海中,我需要某種變量來找到單詞並在某個地方存儲世界,並且需要一個變量來放置要添加或更改的目標單詞。

示例文本:阿拉丁的古登堡計划文字和奇妙的燈

信息/問題更新:下面的代碼在上面的示例文本的全文中找到最常用的單詞。 我現在說的是阿拉丁。 問題是我可以正確替換阿拉丁一詞。 我確實打印出了fooAladdinbar,就像我想要的那樣,但是它不只是更改Aladding = fooAladdinbar,而是在示例文本中的每個字母之間添加fooAladdinbar。

解決了,這是一個可變的問題。

這不是完美的方法,但是可行,這是一個演示:

(此演示僅找到常用詞)

  • 它使用正則表達式分割文本
  • 然后數字
  • 然后返回最常見的單詞

 var data = document.getElementById("data").value; var allWords = data.split(/\\b/); var wordCountList = {}; allWords.forEach(function(word){ if(word !== " "){ if(!wordCountList.hasOwnProperty(word)){ wordCountList[word] = {word: word, count:0}; } wordCountList[word].count++; } }) var maxCountWord = {count:0}; for(var propName in wordCountList){ var currentWord = wordCountList[propName]; if(maxCountWord.count<currentWord.count){ maxCountWord = currentWord; } } console.info(maxCountWord); 
 textarea{ width:100%; height:100px; } 
 <textarea id="data" > <!-- start slipsum code --> The path of the righteous man is beset on all sides by the iniquities of the selfish and the tyranny of evil men. Blessed is he who, in the name of charity and good will, shepherds the weak through the valley of darkness, for he is truly his brother's keeper and the finder of lost children. And I will strike down upon thee with great vengeance and furious anger those who would attempt to poison and destroy My brothers. And you will know My name is the Lord when I lay My vengeance upon thee. <!-- end slipsum code --> </textarea> <div id="result"></div> 

要替換單詞,您也可以使用正則表達式:
(此演示僅替換常用詞)

 function freewordchoice (free, word, choice){ var data = document.getElementById("data").innerHTML; var replaceExpression = new RegExp("\\\\b"+word+"\\\\b","gi"); console.info(replaceExpression); data =data.replace(replaceExpression, free + word + choice); document.getElementById("result").innerHTML = data; } freewordchoice("<b>", "the", "</b>"); 
 <b>Before:</b> <div id="data" > <!-- start slipsum code --> The path of the righteous man is beset on all sides by the iniquities of the selfish and the tyranny of evil men. Blessed is he who, in the name of charity and good will, shepherds the weak through the valley of darkness, for he is truly his brother's keeper and the finder of lost children. And I will strike down upon thee with great vengeance and furious anger those who would attempt to poison and destroy My brothers. And you will know My name is the Lord when I lay My vengeance upon thee. <!-- end slipsum code --> </div> <br/><br/> <b>After:</b> <div id="result" > </div> 

更新:

問題是這條線

common = 'the,a,do,in,with,this,so,that,of,and,not,did,when,what,were,went,was,as,  
if,who,had,at,can,you,which,while,will,to,till,then,them,their,she,  
he,once,out,no,must,many,me,is,it,his,him,her,about,have,i,has,your,  
would,where,whom,s,on,from,for,by,but,all,said,my,';

問題出在字符串末尾,said,my,'; 刪除最后一個逗號,它應該可以正常工作,如下所示:

common = 'the,a,do,in,with,this,so,that,of,and,not,did,when,what,were,went,was,as,  
if,who,had,at,can,you,which,while,will,to,till,then,them,their,she,  
he,once,out,no,must,many,me,is,it,his,him,her,about,have,i,has,your,  
would,where,whom,s,on,from,for,by,but,all,said,my';

由於通過最后一個逗號,所以最后一個單詞是一個空字符串。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM