简体   繁体   中英

How do I print the number of lines from a File that contains a specific word using Python?

This prints out the number of all the lines:

def links(htmlfile):
    infile = open('twolinks.html', 'r')
    content = infile.readlines()
    infile.close()
    return len(content)
    print("# of lines: " + str(content.count('</a>')))

But I only need the number of lines which contain < / a > at the end.

The loop way:

with open('twolinks.html') as f:
    count = 0
    for line in f:
       if line.endswith('</a>'):
           count += 1

Using comprehension:

with open('twolinks.html') as f:
    sum( 1 for line in f if line.endswith('</a>') )

Or even shorter (summing booleans, treating them as 0s and 1s):

sum( line.endswith('</a>') for line in f )
import re
with open('data') as f:
    print(sum( 1 for line in f if re.search('</a>',line) ))
num_lines = sum(1 for line in open('file') if '</a>' in line)
print num_lines

I guess that my answer is a bit longer in terms of code lines, but why not use a HTML parser since you know that you are parsing HTML? for instance:

from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.count = 0

    def handle_endtag(self, tag):
        if tag == "a":
            self.count += 1 
        print "Encountered an end tag :", tag
        print self.count

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
        '<body><h1>Parse me!</h1><a></a></body></html>')

this is modified code from the python pages. This is then easier to modify if you find the need for collecting other tags, or data with tags etc.

Or you could do something like that:

count = 0
f = open("file.txt", "r")
for line in f:
    if(line[-5:].rstrip('\n')=='</a>'):
        count+=1

Worked great for me.

In general, you go through the file each line at a time, and see it the last characters (without the \\n ) match </a> . see if the \\n striping gives you any trouble.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM