简体   繁体   English

如何使用 python 从特定关键字中提取有限的数据行

[英]How to extract limited lines of data from specific keyword using python

I have a text file where I need to extract first five lines ones a specified keyword occurs in the paragraph.我有一个文本文件,我需要在其中提取段落中出现指定关键字的前五行。

I am able to find keywords but not able to write next five lines from that keyword.我能够找到关键字,但无法从该关键字中写出接下来的五行。

mylines = []                              

with open ('D:\\Tasks\\Task_20\\txt\\CV (4).txt', 'rt') as myfile:  

    for line in myfile:                   

        mylines.append(line)             

    for element in mylines:               

        print(element, end='')  

print(mylines[0].find("P"))

Please help if anybody have any idea on how to do so.如果有人对如何做到这一点有任何想法,请提供帮助。

Input Text File Example:-输入文本文件示例:-

Philippine Partner Agency: ALL POWER STAFFING SOLUTIONS, INC.菲律宾合作机构:ALL POWER STAFFING SOLUTIONS, INC.

Training Objectives: : To have international cultural exposure and hands-on experience in the field of hospitality management as a gateway to a meaningful hospitality career.培训目标: :在酒店管理领域拥有国际文化接触和实践经验,作为通往有意义的酒店职业生涯的门户。 To develop my hospitality management skills and become globally competitive.发展我的酒店管理技能并具有全球竞争力。

Education Institution Name: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES Location Hom as Pinas City, Philippine Institution start date: (June 2007教育机构名称:SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES 地点 Hom as Pinas City, Philippine 机构开课日期:(2007 年 6 月

Required Output:-需要 Output:-

Training Objectives: : To have international cultural exposure and hands-on experience in the field of hospitality management as a gateway to a meaningful hospitality career.培训目标: :在酒店管理领域拥有国际文化接触和实践经验,作为通往有意义的酒店职业生涯的门户。 To develop my hospitality management skills and become globally competitive.发展我的酒店管理技能并具有全球竞争力。

# #

I have to search Training Objective Keyword in text file and ones it find that it should write next 5 lines only.我必须在文本文件中搜索培训目标关键字,并且发现它应该只写下 5 行。

If you're simply trying to extract the entire "Training Objectives" block, look for the keyword and keep appending lines until you hit an empty line (or some other suitable marker, the next header for example).如果您只是想提取整个“培训目标”块,请查找关键字并继续添加行,直到您找到空行(或其他合适的标记,例如下一个 header)。

(edited to handle multiple files and keywords) (编辑以处理多个文件和关键字)

def extract_block(filename, keywords):
    mylines = []
    with open(filename) as myfile:
        save_flag = False
        for line in myfile:
            if any(line.startswith(kw) for kw in keywords):
                save_flag = True
            elif line.strip() == '':
                save_flag = False
            if save_flag:
                mylines.append(line)
    return mylines

filenames = ['file1.txt', 'file2.txt', 'file3.txt']
keywords = ['keyword1', 'keyword2', 'keyword3']
for filename in filenames:
    block = extract_block(filename, keywords)

This assumes there is only 1 block that you want in each file.这假设每个文件中只有 1 个块。 If you're extracting multiple blocks from each file, it would get more complicated.如果您从每个文件中提取多个块,它会变得更加复杂。

If you really want 5 lines, always and every time, then you could do something similar but add a counter to count out your 5 lines.如果您真的每次都想要 5 行,那么您可以做类似的事情,但添加一个计数器来计算您的 5 行。

Try this:尝试这个:

with open('test.txt') as f:
    content = f.readlines()
index = [x for x in range(len(content)) if 'training objectives' in content[x].lower()]
for num in index:
    for lines in content[num:num+5]:
        print (lines)

If you have only a few words (just to get the index):如果你只有几句话(只是为了获取索引):

index = []
for i, line in enumerate(content):
    if 'hello' in line or 'there' in line:     //add your or + word here
        index.append(i)
print(index)

If you have many (just to get the index):如果你有很多(只是为了获得索引):

list = ["hello","there","blink"]    //insert your words here
index = []
for i, line in enumerate(content):
    for items in list:
        if items in line:
            index.append(i)
print(index)

It depends on where you're \n's are but i put a regex together that might help with a sample of how my text looks in the variable st:这取决于你在哪里\n,但我将一个正则表达式放在一起,这可能有助于我的文本在变量 st 中的外观示例:

In [254]: st                                                                                  

Out[254]: 'Philippine Partner Agency: ALL POWER STAFFING SOLUTIONS, INC.\n\nTraining Objectives::\nTo have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\nEducation Institution Name: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES Location Hom as Pinas City, Philippine Institution start date: (June 2007\n'

impore re

re.findall('Training Objectives:.*\n((?:.*\n){1,5})', st)   

Out[255]: ['To have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\n']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 python 中的关键字开始并以不同关键字结尾的字符串中提取特定行? - How do I extract specific lines from a string starting from a keyword and ending at a different keyword in python? 如何从数据文件中提取特定行 - how to extract specific lines from a data file 如何使用python从xml提取特定数据? - How do I extract specific data from xml using python? 如何使用 python 从 JSON object 中提取特定数据? - How to extract specific data from JSON object using python? 如何从 json 中提取特定数据并使用 python 放入 csv - how to extract specific data from json and put in to csv using python 如何使用 python 中的 pandas 从我的 json 数据集中提取包含特定关键字的特定行? - how can I extract specific row which contain specific keyword from my json dataset using pandas in python? 如何在python中的特定关键字之前提取文本? - How to extract text before a specific keyword in python? 如何修复Python 3代码以从文本文件中提取特定行 - How to fix Python 3 code to extract specific lines from a text file 如何从大文本文件中提取特定的数据行 - How to extract specific lines of data from a big text file 如何从文件中提取特定行并将特定行保存到python中的每个新文件中 - How to extract specific lines from a file and save specific lines into each new file in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM