查找獨特的單詞及其計數

Question

我可以將字符串轉換為小寫字母，而無需在下面的代碼中對“單詞”運行循環嗎？

class TextFileHandeling:
        def __init__(self,path,mode):
           self.path=path
           self.mode=mode


    def reading_file(self):
        file_read= open(self.path,self.mode)
        lines=file_read.read() 
        words=lines.split()   #split strings into words
        return words
        file_read.close()      


    def writing_file(self,words):
        unique=[]
        file_write= open(self.path,self.mode)
        small_letters=[]

        **for i in words:
            small_letters.append(i.lower())**

        for j in small_letters:
            if j not in unique:
                unique.append(j)
                file_write.write(f"{str(j)} {small_letters.count(j)}\n")
        return file_write
        file_write.close()

read_file=TextFileHandeling('D:\\python_practise\\read.txt','r')
write_file=TextFileHandeling('D:\\python_practise\\reader.txt','w')

words= read_file.reading_file()
write_file.writing_file(words)

Answer 1

您已經在遍歷您的small_letter列表，所以為什么不只在該循環中將單詞轉換為小寫。

for word in words:
    if word.lower() not in unique:
        pass

我也不知道沒有循環遍歷列表的單詞小寫。

Answer 2

首先，如果您在 return 語句之后有 return 子句，則您沒有關閉文件，所以

   def reading_file(self):
        file_read= open(self.path,self.mode)
        lines=file_read.read() 
        words=lines.split()  
        return words
        file_read.close()

必須變成：

 def reading_file(self):
        file_read= open(self.path,self.mode)
        lines=file_read.read() 
        words=lines.split()  
        file_read.close() 
        return words

我還建議您在get_words中更改function的名稱，因為function的真正功能是給您列出的單詞，而不是讀取文件。

為了降低所有單詞，請使用 function map：

def my_lower(str_in):
    return str_in.lower()
small_letters = map(my_lower, words)

此外，由於看起來您正在使用 unique 作為集合，因此如果已經編寫了某些內容，請考慮更改集合中 unique 的數據結構。 在這里您可以找到集合的文檔：

https://docs.python.org/2/library/sets.html

Answer 3

這應該完美地工作：

from collections import Counter
from string import ascii_letters

with open('somefile.txt') as fin :
    text = fin.read()    # add .lower() if you need lower case only

# filter out the puctuation
filtered = [t if t in ascii_letters else ' ' for t in text.lower()]

Counter(filtered.split())

查找獨特的單詞及其計數

問題描述

3 個解決方案

解決方案1
0 2020-04-04 14:41:49

解決方案2
0 2020-04-04 14:58:25

解決方案3
-1 2020-04-04 14:46:52

查找獨特的單詞及其計數

問題描述

3 個解決方案

解決方案1 0 2020-04-04 14:41:49

解決方案2 0 2020-04-04 14:58:25

解決方案3 -1 2020-04-04 14:46:52

解決方案1
0 2020-04-04 14:41:49

解決方案2
0 2020-04-04 14:58:25

解決方案3
-1 2020-04-04 14:46:52