How do I print the number of lines from a File that contains a specific word using Python?

Question

This prints out the number of all the lines:

def links(htmlfile):
    infile = open('twolinks.html', 'r')
    content = infile.readlines()
    infile.close()
    return len(content)
    print("# of lines: " + str(content.count('</a>')))

But I only need the number of lines which contain < / a > at the end.

Answer 1

The loop way:

with open('twolinks.html') as f:
    count = 0
    for line in f:
       if line.endswith('</a>'):
           count += 1

Using comprehension:

with open('twolinks.html') as f:
    sum( 1 for line in f if line.endswith('</a>') )

Or even shorter (summing booleans, treating them as 0s and 1s):

sum( line.endswith('</a>') for line in f )

Answer 2

import re
with open('data') as f:
    print(sum( 1 for line in f if re.search('</a>',line) ))

Answer 3

num_lines = sum(1 for line in open('file') if '</a>' in line)
print num_lines

Answer 4

I guess that my answer is a bit longer in terms of code lines, but why not use a HTML parser since you know that you are parsing HTML? for instance:

from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.count = 0

    def handle_endtag(self, tag):
        if tag == "a":
            self.count += 1 
        print "Encountered an end tag :", tag
        print self.count

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
        '<body><h1>Parse me!</h1><a></a></body></html>')

this is modified code from the python pages. This is then easier to modify if you find the need for collecting other tags, or data with tags etc.

Answer 5

Or you could do something like that:

count = 0
f = open("file.txt", "r")
for line in f:
    if(line[-5:].rstrip('\n')=='</a>'):
        count+=1

Worked great for me.

In general, you go through the file each line at a time, and see it the last characters (without the \\n ) match </a> . see if the \\n striping gives you any trouble.

How do I print the number of lines from a File that contains a specific word using Python?

Question

5 answers

solution1
2 2015-10-03 20:45:11

solution2
1 2015-10-03 21:03:25

solution3
1

solution4
0 2015-10-03 21:46:43

solution5
-1 2015-10-03 21:22:26

How do I print the number of lines from a File that contains a specific word using Python?

Question

5 answers

solution1 2 2015-10-03 20:45:11

solution2 1 2015-10-03 21:03:25

solution3 1

solution4 0 2015-10-03 21:46:43

solution5 -1 2015-10-03 21:22:26

solution1
2 2015-10-03 20:45:11

solution2
1 2015-10-03 21:03:25

solution3
1

solution4
0 2015-10-03 21:46:43

solution5
-1 2015-10-03 21:22:26