[英]How to extract limited lines of data from specific keyword using python
I have a text file where I need to extract first five lines ones a specified keyword occurs in the paragraph.我有一个文本文件,我需要在其中提取段落中出现指定关键字的前五行。
I am able to find keywords but not able to write next five lines from that keyword.我能够找到关键字,但无法从该关键字中写出接下来的五行。
mylines = []
with open ('D:\\Tasks\\Task_20\\txt\\CV (4).txt', 'rt') as myfile:
for line in myfile:
mylines.append(line)
for element in mylines:
print(element, end='')
print(mylines[0].find("P"))
Please help if anybody have any idea on how to do so.如果有人对如何做到这一点有任何想法,请提供帮助。
Input Text File Example:-输入文本文件示例:-
Philippine Partner Agency: ALL POWER STAFFING SOLUTIONS, INC.菲律宾合作机构:ALL POWER STAFFING SOLUTIONS, INC.
Training Objectives: : To have international cultural exposure and hands-on experience in the field of hospitality management as a gateway to a meaningful hospitality career.培训目标: :在酒店管理领域拥有国际文化接触和实践经验,作为通往有意义的酒店职业生涯的门户。 To develop my hospitality management skills and become globally competitive.
发展我的酒店管理技能并具有全球竞争力。
Education Institution Name: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES Location Hom as Pinas City, Philippine Institution start date: (June 2007教育机构名称:SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES 地点 Hom as Pinas City, Philippine 机构开课日期:(2007 年 6 月
Required Output:-需要 Output:-
Training Objectives: : To have international cultural exposure and hands-on experience in the field of hospitality management as a gateway to a meaningful hospitality career.培训目标: :在酒店管理领域拥有国际文化接触和实践经验,作为通往有意义的酒店职业生涯的门户。 To develop my hospitality management skills and become globally competitive.
发展我的酒店管理技能并具有全球竞争力。
I have to search Training Objective Keyword in text file and ones it find that it should write next 5 lines only.我必须在文本文件中搜索培训目标关键字,并且发现它应该只写下 5 行。
If you're simply trying to extract the entire "Training Objectives" block, look for the keyword and keep appending lines until you hit an empty line (or some other suitable marker, the next header for example).如果您只是想提取整个“培训目标”块,请查找关键字并继续添加行,直到您找到空行(或其他合适的标记,例如下一个 header)。
(edited to handle multiple files and keywords) (编辑以处理多个文件和关键字)
def extract_block(filename, keywords):
mylines = []
with open(filename) as myfile:
save_flag = False
for line in myfile:
if any(line.startswith(kw) for kw in keywords):
save_flag = True
elif line.strip() == '':
save_flag = False
if save_flag:
mylines.append(line)
return mylines
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
keywords = ['keyword1', 'keyword2', 'keyword3']
for filename in filenames:
block = extract_block(filename, keywords)
This assumes there is only 1 block that you want in each file.这假设每个文件中只有 1 个块。 If you're extracting multiple blocks from each file, it would get more complicated.
如果您从每个文件中提取多个块,它会变得更加复杂。
If you really want 5 lines, always and every time, then you could do something similar but add a counter to count out your 5 lines.如果您真的每次都想要 5 行,那么您可以做类似的事情,但添加一个计数器来计算您的 5 行。
Try this:尝试这个:
with open('test.txt') as f:
content = f.readlines()
index = [x for x in range(len(content)) if 'training objectives' in content[x].lower()]
for num in index:
for lines in content[num:num+5]:
print (lines)
If you have only a few words (just to get the index):如果你只有几句话(只是为了获取索引):
index = []
for i, line in enumerate(content):
if 'hello' in line or 'there' in line: //add your or + word here
index.append(i)
print(index)
If you have many (just to get the index):如果你有很多(只是为了获得索引):
list = ["hello","there","blink"] //insert your words here
index = []
for i, line in enumerate(content):
for items in list:
if items in line:
index.append(i)
print(index)
It depends on where you're \n's are but i put a regex together that might help with a sample of how my text looks in the variable st:这取决于你在哪里\n,但我将一个正则表达式放在一起,这可能有助于我的文本在变量 st 中的外观示例:
In [254]: st
Out[254]: 'Philippine Partner Agency: ALL POWER STAFFING SOLUTIONS, INC.\n\nTraining Objectives::\nTo have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\nEducation Institution Name: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES Location Hom as Pinas City, Philippine Institution start date: (June 2007\n'
impore re
re.findall('Training Objectives:.*\n((?:.*\n){1,5})', st)
Out[255]: ['To have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\n']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.