简体   繁体   中英

how to calculate how many line has a specific word

I'm not sure if the if statement is wrong? I tried to split each line and iterate through each index and find 'the raven' and return the count.

def count_word(file_url, word):
    r = requests.get(file_url, stream=True)
    count = 0

    for line in r.iter_lines():
        words = line.split()
        if line[1:] == 'the raven':
            count += 1
    return count

When you do

`words = line.split()`

you're assigning to the variable words a list of strings - the non-whitespace strings in the line. But you're not doing anything with words after that. Instead, you do:

if line[1:] == 'the raven':

which checks if the whole line, minus its first character, is exactly 'the raven'.

(Edited for handing unicode/bytes): If you want to add up the total number of times 'the raven' appears in your whole file, you can skip the split and the if and get the count of occurrences directly from each line. Because requests gives you bytes objects (in python 3) or unicode objects (in python 2) you'll need to decode the lines with the appropriate encoding first:

for line in r.iter_lines():
    count += line.decode('utf-8').count('the raven')

If instead you want to return the total number of lines in which 'the raven' appears at all, you can do:

for line in r.iter_lines():
    if 'the raven' in line.decode('utf-8'):
        count += 1

You may need to choose a different encoding, depending on your data source.

The following slight edits to your code will allow you to count any word as defined by the parameter word in the file defined by file_url .

def count_word(file_url, word):
    r = requests.get(file_url, stream=True)
    count = 0

    for line in r.iter_lines():
        count += line.count(word)

    return count

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM