简体   繁体   中英

Counting how many words each line has in at text file with Python (using str.split)

I have two files, one for input which is "our_input.txt" (same directory as the code file), which is Oscar Wild's Dorian Gray. Anyway, there's also an output file, and I need to open the original file, and let Python count how many words each line has, and write in the output.

I tried, but I got lost...

You can try something like this.

First you read your input file:

with open('our_input.txt') as f:
    lines = f.readlines()

Then you count the number of words per line and write to the output file:

with open('our_output.txt', 'w') as f:
    for index, value in enumerate(lines):
        number_of_words = len(value.split())        
        f.write('Line number {} has {} words.\n'.format(index + 1, number_of_words))

You will need to to iterate over each line of the input text file. That's done with a standard for loop. You can after split each line at each space char, and count with len() the number of elements in the list. You append this to the output file and you are done

A simple technique in any language for word counting in files is:

  1. Read file into a variable.
  2. Replace unnecessary characters such as carriage returns or line feeds with space character. Trim space characters from beginning and end of string.
  3. Replace multiple space characters with single.

We now have a string with words separated by single spaces.

Now either

  • Use the language's split function with space as the delimiter, to produce an array. The number of words is the array length, adjusted for the lower bound of the array being zero or 1 in the language in use.

or

  • If the language has a count-character-of-specified-type function then use that to count the number of spaces in the string. Add 1. This is the number of words.

The size of the file being worked upon could make this a weighty job for the processor and performance will depend on how the language handles strings and arrays.

If you are working client-server or the text is stored in a database consider the high network cost of moving the string. Better to run the count as close to the data location as possible. So if using an RDBMS use a stored procedure - faster to count words in a 2Gb string and ship an int variable with the answer out to the client than to ship the 2Gb string and count in a web browser.

If you cannot read the entire file in one pass then you can read line-by-line and carry out the above techniques per line. However, due to string handling and loop-running overhead, performance will be faster if you can process the entire file as one string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM