計算文件每一行中每個單詞的字符數

Question

此代碼將打印文本文件中的總行數，單詞總數和字符總數。 它工作正常，並提供了預期的輸出。 但是我想計算每行中的字符數並像這樣打印：-

Line No. 1 has 58 Characters
Line No. 2 has 24 Characters

代碼：-

import string
def fileCount(fname):
    #counting variables
    lineCount = 0
    wordCount = 0
    charCount = 0
    words = []

    #file is opened and assigned a variable
    infile = open(fname, 'r')

    #loop that finds the number of lines in the file
    for line in infile:
        lineCount = lineCount + 1
        word = line.split()
        words = words + word

    #loop that finds the number of words in the file
    for word in words:
        wordCount = wordCount + 1
        #loop that finds the number of characters in the file
        for char in word:
            charCount = charCount + 1
    #returns the variables so they can be called to the main function        
    return(lineCount, wordCount, charCount)

def main():
    fname = input('Enter the name of the file to be used: ')
    lineCount, wordCount, charCount = fileCount(fname)
    print ("There are", lineCount, "lines in the file.")
    print ("There are", charCount, "characters in the file.")
    print ("There are", wordCount, "words in the file.")
main()

如

for line in infile:
    lineCount = lineCount + 1

正在計算整條線，但是如何進行此操作的每一條線呢？ 我正在使用Python 3.X

Answer 1

將所有信息存儲在字典中，然后按鍵訪問。

def fileCount(fname):
    #counting variables
    d = {"lines":0, "words": 0, "lengths":[]}
    #file is opened and assigned a variable
    with open(fname, 'r') as f:
        for line in f:
            # split into words
            spl = line.split()
            # increase count for each line
            d["lines"] += 1
            # add length of split list which will give total words
            d["words"] += len(spl)
            # get the length of each word and sum
            d["lengths"].append(sum(len(word) for word in spl))
    return d

def main():
    fname = input('Enter the name of the file to be used: ')
    data = fileCount(fname)
    print ("There are {lines} lines in the file.".format(**data))
    print ("There are {} characters in the file.".format(sum(data["lengths"])))
    print ("There are {words} words in the file.".format(**data))
    # enumerate over the lengths, outputting char count for each line
    for ind, s in enumerate(data["lengths"], 1):
        print("Line: {} has {} characters.".format(ind, s))
main()

該代碼僅適用於由空格分隔的單詞，因此您需要牢記這一點。

Answer 2

定義一set您希望計數的允許字符，然后可以使用len獲取大部分數據。
在下面，我選擇了字符集：

['！'，'“”，'＃'，'$'，'％'，'＆'，''''，'（'，'）'，'*'，'+'，'，'，' -'，'。'，'/'，'0'，'1'，'2'，'3'，'4'，'5'，'6'，'7'，'8'，'9' ，'：'，';'，'<'，'='，'>'，'？'，'@'，'A'，'B'，'C'，'D'，'E'，' F'，'G'，'H'，'I'，'J'，'K'，'L'，'M'，'N'，'O'，'P'，'Q'，'R' ，“ S”，“ T”，“ U”，“ V”，“ W”，“ X”，“ Y”，“ Z”，“ [”，“ \\”，“]”，“ ^”，“ _'，'`'，'a'，'b'，'c'，'d'，'e'，'f'，'g'，'h'，'i'，'j'，'k' ，“ l”，“ m”，“ n”，“ o”，“ p”，“ q”，“ r”，“ s”，“ t”，“ u”，“ v”，“ w”，“ x'，'y'，'z'，'{'，'|'，'}'，'〜']

#Define desired character set
valid_chars = set([chr(i) for i in range(33,127)])
total_lines = total_words = total_chars = 0
line_details = []

with open ('test.txt', 'r') as f:
    for line in f:
        total_lines += 1
        line_char_count = len([char for char in line if char in valid_chars])
        total_chars += line_char_count
        total_words += len(line.split())
        line_details.append("Line %d has %d characters" % (total_lines, line_char_count))

print ("There are", total_lines, "lines in the file.")
print ("There are", total_chars, "characters in the file.")
print ("There are", total_words, "words in the file.")
for line in line_details:
    print (line)

Answer 3

我被分配了創建一個程序來打印一行中的字符數的任務。

作為編程的菜鳥，我發現這很困難:(。

這是我想出的，以及他的回應-

這是程序的核心部分：

with open ('data_vis_tips.txt', 'r') as inFile:
    with open ('count_chars_per_line.txt', 'w') as outFile:
        chars = 0
            for line in inFile:
                line = line.strip('\n')
                chars = len(line)
                outFile.write(str(len(line))+'\n')

可以簡化為：

with open ('data_vis_tips.txt', 'r') as inFile:
    for line in inFile:
        line = line.strip()
        num_chars = len(line)
        print(num_chars)

注意，不需要strip（）函數的參數。 默認情況下，它會刪除空格，而'\\ n'是空格。

Answer 4

這是一個使用內置collections.Counter的更簡單的版本。Counter是對輸入進行計數的專用字典。 我們可以使用Counter.update()方法在每行中輸入所有單詞（無論是否唯一）：

from collections import Counter

def file_count_2(fname):

    line_count = 0
    word_counter = Counter()

    infile = open(fname, 'r')
    for line in infile:
        line_count += 1
        word_counter.update( line.split() )

    word_count = 0
    char_count = 0

    for word, cnt in word_counter.items():
        word_count += cnt
        char_count += cnt * len(word)

    print(word_counter)

    return line_count, word_count, char_count

筆記：

我對此進行了測試，它為您的代碼提供了相同的計數
這樣做會更快，因為您不必迭代地添加到列表words （更好的方法是僅對唯一單詞進行哈希處理並存儲其計數，這是Counter所做的事情），並且每次我們看到時都不需要迭代和遞增charCount一個單詞的出現。
如果您只想要word_count而不是char_count ，則可以直接采用word_count = sum(word_counter.values())而不需要遍歷word_counter

計算文件每一行中每個單詞的字符數

問題描述

4 個解決方案

解決方案1
1 2015-07-12 18:12:18

解決方案2
0 2018-05-12 09:04:44

解決方案3
-1 2018-05-12 08:07:32

解決方案4
-1 2018-05-13 09:05:02

計算文件每一行中每個單詞的字符數

問題描述

4 個解決方案

解決方案1 1 2015-07-12 18:12:18

解決方案2 0 2018-05-12 09:04:44

解決方案3 -1 2018-05-12 08:07:32

解決方案4 -1 2018-05-13 09:05:02

解決方案1
1 2015-07-12 18:12:18

解決方案2
0 2018-05-12 09:04:44

解決方案3
-1 2018-05-12 08:07:32

解決方案4
-1 2018-05-13 09:05:02