简体   繁体   中英

How do we print the line numbers in which a word appears from a text file in Python?

I need this to print the corresponding line numbers from the text file.

def index (filename, lst):
    infile = open('raven.txt', 'r')
    lines =  infile.readlines()
    words = []
    dic = {}

    for line in lines:
        line_words = line.split(' ')
        words.append(line_words)
    for i in range(len(words)):
        for j in range(len(words[i])):
            if words[i][j] in lst:

                dic[words[i][j]] = i

    return dic

The result:

In: index('raven.txt',['raven', 'mortal', 'dying', 'ghost', 'ghastly', 'evil', 'demon'])

Out: {'dying': 8, 'mortal': 29, 'raven': 77, 'ghost': 8}

(The words above appear in several lines but it's only printing one line and for some it doesn't print anything Also, it does not count the empty lines in the text file. So 8 should actually be 9 because there's an empty line which it is not counting.)

Please tell me how to fix this.

def index (filename, lst):

    infile = open('raven.txt', 'r')
    lines =  infile.readlines()
    words = []
    dic = {}

    for line in lines:
        line_words = line.split(' ')
        words.append(line_words)
    for i in range(len(words)):
        for j in range(len(words[i])):
            if words[i][j] in lst:
                if words[i][j] not in dic.keys():
                    dic[words[i][j]] = set()
                dic[words[i][j]].add(i + 1) #range starts from 0
    return dic

Using a set instead of a list is useful in cases were the word is present several times in the same line.

Use defaultdict to create a list of linenumbers for each line:

from collections import defaultdict
def index(filename, lst):
    with open(filename, 'r') as infile:
        lines = [line.split() for line in infile]
    word2linenumbers = defaultdict(list)

    for linenumber, line in enumerate(lines, 1):
        for word in line:
            if word in lst:
                word2linenumbers[word].append(linenumber)
    return word2linenumbers

You can also use dict.setdefault to either start a new list for each word or append to an existing list if that word has already been found:

def index(filename, lst):
    # For larger lists, checking membership will be asymptotically faster using a set.
    lst = set(lst) 
    dic = {}

    with open(filename, 'r') as fobj:
        for lineno, line in enumerate(fobj, 1):
            words = line.split()
            for word in words:
                if word in lst:
                    dic.setdefault(word, []).append(lineno)

    return dic

Youre two main problems can be fixed by:

1.) multiple indices: you need to initiate/assign a list as the dict value instead of just a single int. otherwise, each word will be reassigned a new index every time a new line is found with that word.

2.) empty lines SHOULD be read as a line so I think its just an indexing issue. your first line is indexed to 0 since the first number in a range starts at 0.

You can simplify your program as follows:

def index (filename, lst):
    wordinds = {key:[] for key in lst} #initiates an empty list for each word
    with open(filename,'r') as infile: #why use filename param if you hardcoded the open....
    #the with statement is useful. trust.
        for linenum,line in enumerate(infile):
            for word in line.rstrip().split(): #strip new line and split into words
                if word in wordinds:
                    wordinds[word].append(linenum)

    return {x for x in wordinds.iteritems() if x[1]} #filters empty lists

this simplifies everything to nest into one for loop that is enumerated for each line. if you want the first line to be 1 and second line as 2 you would have to change wordinds[word].append(linenum) to ....append(linenum + 1)

EDIT: someone made a good point in another answer to have enumerate(infile,1) to start your enumeration at index 1. thats way cleaner.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM