简体   繁体   中英

Count the number of characters in every word of every line of a file

this code will print the whole number of lines, total words and total number of characters in a text file. It is working fine and giving expected output. But I want to count the number of characters in each line and print like this :-

Line No. 1 has 58 Characters
Line No. 2 has 24 Characters

Code :-

import string
def fileCount(fname):
    #counting variables
    lineCount = 0
    wordCount = 0
    charCount = 0
    words = []

    #file is opened and assigned a variable
    infile = open(fname, 'r')

    #loop that finds the number of lines in the file
    for line in infile:
        lineCount = lineCount + 1
        word = line.split()
        words = words + word

    #loop that finds the number of words in the file
    for word in words:
        wordCount = wordCount + 1
        #loop that finds the number of characters in the file
        for char in word:
            charCount = charCount + 1
    #returns the variables so they can be called to the main function        
    return(lineCount, wordCount, charCount)

def main():
    fname = input('Enter the name of the file to be used: ')
    lineCount, wordCount, charCount = fileCount(fname)
    print ("There are", lineCount, "lines in the file.")
    print ("There are", charCount, "characters in the file.")
    print ("There are", wordCount, "words in the file.")
main()

As

for line in infile:
    lineCount = lineCount + 1 

is counting the whole lines, but how to take the each line for this operation? I am using Python 3.X

Store all the info in a dict then access by key.

def fileCount(fname):
    #counting variables
    d = {"lines":0, "words": 0, "lengths":[]}
    #file is opened and assigned a variable
    with open(fname, 'r') as f:
        for line in f:
            # split into words
            spl = line.split()
            # increase count for each line
            d["lines"] += 1
            # add length of split list which will give total words
            d["words"] += len(spl)
            # get the length of each word and sum
            d["lengths"].append(sum(len(word) for word in spl))
    return d

def main():
    fname = input('Enter the name of the file to be used: ')
    data = fileCount(fname)
    print ("There are {lines} lines in the file.".format(**data))
    print ("There are {} characters in the file.".format(sum(data["lengths"])))
    print ("There are {words} words in the file.".format(**data))
    # enumerate over the lengths, outputting char count for each line
    for ind, s in enumerate(data["lengths"], 1):
        print("Line: {} has {} characters.".format(ind, s))
main()

The code will only work for words delimited by whitespace so that is something you need to keep in mind.

Define a set of the allowed characters that you wish to count and then you can use len to get most of the data.
Below, I have chosen the character set:

['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\', ']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~']

#Define desired character set
valid_chars = set([chr(i) for i in range(33,127)])
total_lines = total_words = total_chars = 0
line_details = []

with open ('test.txt', 'r') as f:
    for line in f:
        total_lines += 1
        line_char_count = len([char for char in line if char in valid_chars])
        total_chars += line_char_count
        total_words += len(line.split())
        line_details.append("Line %d has %d characters" % (total_lines, line_char_count))

print ("There are", total_lines, "lines in the file.")
print ("There are", total_chars, "characters in the file.")
print ("There are", total_words, "words in the file.")
for line in line_details:
    print (line)

I was assigned the task of creating a program that prints the number of characters in a line.

As a noob to programming, I found this was very difficult :(.

Here is what I came up with, as well as his response -

Here's the core part of your program:

with open ('data_vis_tips.txt', 'r') as inFile:
    with open ('count_chars_per_line.txt', 'w') as outFile:
        chars = 0
            for line in inFile:
                line = line.strip('\n')
                chars = len(line)
                outFile.write(str(len(line))+'\n')

It could be simplified to this:

with open ('data_vis_tips.txt', 'r') as inFile:
    for line in inFile:
        line = line.strip()
        num_chars = len(line)
        print(num_chars)

Note that the argument to the strip() function isn't required; it strips whitespace by default, and '\\n' is whitespace.

Here is an easier version using the builtin collections.Counter which is a specialized dict which counts its inputs. We can use Counter.update() method to slurp in all words (unique or not) on each line:

from collections import Counter

def file_count_2(fname):

    line_count = 0
    word_counter = Counter()

    infile = open(fname, 'r')
    for line in infile:
        line_count += 1
        word_counter.update( line.split() )

    word_count = 0
    char_count = 0

    for word, cnt in word_counter.items():
        word_count += cnt
        char_count += cnt * len(word)

    print(word_counter)

    return line_count, word_count, char_count

Notes:

  • I tested this and it gives identical counts to your code
  • it will be way faster since you're not iteratively appending to a list words (better to just hash the unique words only and store their counts, which is what Counter does), and also no need to iterate and increment charCount every time we see an occurrence of a word.
  • if you only wanted word_count not char_count , you could just directly take word_count = sum(word_counter.values()) without needing to iterate over word_counter

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM