简体   繁体   English

计算文件每一行中每个单词的字符数

[英]Count the number of characters in every word of every line of a file

this code will print the whole number of lines, total words and total number of characters in a text file. 此代码将打印文本文件中的总行数,单词总数和字符总数。 It is working fine and giving expected output. 它工作正常,并提供了预期的输出。 But I want to count the number of characters in each line and print like this :- 但是我想计算每行中的字符数并像这样打印:-

Line No. 1 has 58 Characters
Line No. 2 has 24 Characters

Code :- 代码:-

import string
def fileCount(fname):
    #counting variables
    lineCount = 0
    wordCount = 0
    charCount = 0
    words = []

    #file is opened and assigned a variable
    infile = open(fname, 'r')

    #loop that finds the number of lines in the file
    for line in infile:
        lineCount = lineCount + 1
        word = line.split()
        words = words + word

    #loop that finds the number of words in the file
    for word in words:
        wordCount = wordCount + 1
        #loop that finds the number of characters in the file
        for char in word:
            charCount = charCount + 1
    #returns the variables so they can be called to the main function        
    return(lineCount, wordCount, charCount)

def main():
    fname = input('Enter the name of the file to be used: ')
    lineCount, wordCount, charCount = fileCount(fname)
    print ("There are", lineCount, "lines in the file.")
    print ("There are", charCount, "characters in the file.")
    print ("There are", wordCount, "words in the file.")
main()

As

for line in infile:
    lineCount = lineCount + 1 

is counting the whole lines, but how to take the each line for this operation? 正在计算整条线,但是如何进行此操作的每一条线呢? I am using Python 3.X 我正在使用Python 3.X

Store all the info in a dict then access by key. 将所有信息存储在字典中,然后按键访问。

def fileCount(fname):
    #counting variables
    d = {"lines":0, "words": 0, "lengths":[]}
    #file is opened and assigned a variable
    with open(fname, 'r') as f:
        for line in f:
            # split into words
            spl = line.split()
            # increase count for each line
            d["lines"] += 1
            # add length of split list which will give total words
            d["words"] += len(spl)
            # get the length of each word and sum
            d["lengths"].append(sum(len(word) for word in spl))
    return d

def main():
    fname = input('Enter the name of the file to be used: ')
    data = fileCount(fname)
    print ("There are {lines} lines in the file.".format(**data))
    print ("There are {} characters in the file.".format(sum(data["lengths"])))
    print ("There are {words} words in the file.".format(**data))
    # enumerate over the lengths, outputting char count for each line
    for ind, s in enumerate(data["lengths"], 1):
        print("Line: {} has {} characters.".format(ind, s))
main()

The code will only work for words delimited by whitespace so that is something you need to keep in mind. 该代码仅适用于由空格分隔的单词,因此您需要牢记这一点。

Define a set of the allowed characters that you wish to count and then you can use len to get most of the data. 定义一set您希望计数的允许字符,然后可以使用len获取大部分数据。
Below, I have chosen the character set: 在下面,我选择了字符集:

['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\', ']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~'] ['!','“”,'#','$','%','&','''','(',')','*','+',',',' -','。','/','0','1','2','3','4','5','6','7','8','9' ,':',';','<','=','>','?','@','A','B','C','D','E',' F','G','H','I','J','K','L','M','N','O','P','Q','R' ,“ S”,“ T”,“ U”,“ V”,“ W”,“ X”,“ Y”,“ Z”,“ [”,“ \\”,“]”,“ ^”,“ _','`','a','b','c','d','e','f','g','h','i','j','k' ,“ l”,“ m”,“ n”,“ o”,“ p”,“ q”,“ r”,“ s”,“ t”,“ u”,“ v”,“ w”,“ x','y','z','{','|','}','〜']

#Define desired character set
valid_chars = set([chr(i) for i in range(33,127)])
total_lines = total_words = total_chars = 0
line_details = []

with open ('test.txt', 'r') as f:
    for line in f:
        total_lines += 1
        line_char_count = len([char for char in line if char in valid_chars])
        total_chars += line_char_count
        total_words += len(line.split())
        line_details.append("Line %d has %d characters" % (total_lines, line_char_count))

print ("There are", total_lines, "lines in the file.")
print ("There are", total_chars, "characters in the file.")
print ("There are", total_words, "words in the file.")
for line in line_details:
    print (line)

I was assigned the task of creating a program that prints the number of characters in a line. 我被分配了创建一个程序来打印一行中的字符数的任务。

As a noob to programming, I found this was very difficult :(. 作为编程的菜鸟,我发现这很困难:(。

Here is what I came up with, as well as his response - 这是我想出的,以及他的回应-

Here's the core part of your program: 这是程序的核心部分:

with open ('data_vis_tips.txt', 'r') as inFile:
    with open ('count_chars_per_line.txt', 'w') as outFile:
        chars = 0
            for line in inFile:
                line = line.strip('\n')
                chars = len(line)
                outFile.write(str(len(line))+'\n')

It could be simplified to this: 可以简化为:

with open ('data_vis_tips.txt', 'r') as inFile:
    for line in inFile:
        line = line.strip()
        num_chars = len(line)
        print(num_chars)

Note that the argument to the strip() function isn't required; 注意,不需要strip()函数的参数。 it strips whitespace by default, and '\\n' is whitespace. 默认情况下,它会删除空格,而'\\ n'是空格。

Here is an easier version using the builtin collections.Counter which is a specialized dict which counts its inputs. 这是一个使用内置collections.Counter的更简单的版本。Counter是对输入进行计数的专用字典。 We can use Counter.update() method to slurp in all words (unique or not) on each line: 我们可以使用Counter.update()方法在每行中输入所有单词(无论是否唯一):

from collections import Counter

def file_count_2(fname):

    line_count = 0
    word_counter = Counter()

    infile = open(fname, 'r')
    for line in infile:
        line_count += 1
        word_counter.update( line.split() )

    word_count = 0
    char_count = 0

    for word, cnt in word_counter.items():
        word_count += cnt
        char_count += cnt * len(word)

    print(word_counter)

    return line_count, word_count, char_count

Notes: 笔记:

  • I tested this and it gives identical counts to your code 我对此进行了测试,它为您的代码提供了相同的计数
  • it will be way faster since you're not iteratively appending to a list words (better to just hash the unique words only and store their counts, which is what Counter does), and also no need to iterate and increment charCount every time we see an occurrence of a word. 这样做会更快,因为您不必迭代地添加到列表words (更好的方法是仅对唯一单词进行哈希处理并存储其计数,这是Counter所做的事情),并且每次我们看到时都不需要迭代和递增charCount一个单词的出现。
  • if you only wanted word_count not char_count , you could just directly take word_count = sum(word_counter.values()) without needing to iterate over word_counter 如果您只想要word_count而不是char_count ,则可以直接采用word_count = sum(word_counter.values())而不需要遍历word_counter

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM