简体   繁体   中英

How to extract limited lines of data from specific keyword using python

I have a text file where I need to extract first five lines ones a specified keyword occurs in the paragraph.

I am able to find keywords but not able to write next five lines from that keyword.

mylines = []                              

with open ('D:\\Tasks\\Task_20\\txt\\CV (4).txt', 'rt') as myfile:  

    for line in myfile:                   

        mylines.append(line)             

    for element in mylines:               

        print(element, end='')  

print(mylines[0].find("P"))

Please help if anybody have any idea on how to do so.

Input Text File Example:-

Philippine Partner Agency: ALL POWER STAFFING SOLUTIONS, INC.

Training Objectives: : To have international cultural exposure and hands-on experience in the field of hospitality management as a gateway to a meaningful hospitality career. To develop my hospitality management skills and become globally competitive.

Education Institution Name: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES Location Hom as Pinas City, Philippine Institution start date: (June 2007

Required Output:-

Training Objectives: : To have international cultural exposure and hands-on experience in the field of hospitality management as a gateway to a meaningful hospitality career. To develop my hospitality management skills and become globally competitive.

#

I have to search Training Objective Keyword in text file and ones it find that it should write next 5 lines only.

If you're simply trying to extract the entire "Training Objectives" block, look for the keyword and keep appending lines until you hit an empty line (or some other suitable marker, the next header for example).

(edited to handle multiple files and keywords)

def extract_block(filename, keywords):
    mylines = []
    with open(filename) as myfile:
        save_flag = False
        for line in myfile:
            if any(line.startswith(kw) for kw in keywords):
                save_flag = True
            elif line.strip() == '':
                save_flag = False
            if save_flag:
                mylines.append(line)
    return mylines

filenames = ['file1.txt', 'file2.txt', 'file3.txt']
keywords = ['keyword1', 'keyword2', 'keyword3']
for filename in filenames:
    block = extract_block(filename, keywords)

This assumes there is only 1 block that you want in each file. If you're extracting multiple blocks from each file, it would get more complicated.

If you really want 5 lines, always and every time, then you could do something similar but add a counter to count out your 5 lines.

Try this:

with open('test.txt') as f:
    content = f.readlines()
index = [x for x in range(len(content)) if 'training objectives' in content[x].lower()]
for num in index:
    for lines in content[num:num+5]:
        print (lines)

If you have only a few words (just to get the index):

index = []
for i, line in enumerate(content):
    if 'hello' in line or 'there' in line:     //add your or + word here
        index.append(i)
print(index)

If you have many (just to get the index):

list = ["hello","there","blink"]    //insert your words here
index = []
for i, line in enumerate(content):
    for items in list:
        if items in line:
            index.append(i)
print(index)

It depends on where you're \n's are but i put a regex together that might help with a sample of how my text looks in the variable st:

In [254]: st                                                                                  

Out[254]: 'Philippine Partner Agency: ALL POWER STAFFING SOLUTIONS, INC.\n\nTraining Objectives::\nTo have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\nEducation Institution Name: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES Location Hom as Pinas City, Philippine Institution start date: (June 2007\n'

impore re

re.findall('Training Objectives:.*\n((?:.*\n){1,5})', st)   

Out[255]: ['To have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\n']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM