![](/img/trans.png)
[英]How do I extract specific lines from a string starting from a keyword and ending at a different keyword in python?
[英]How to extract limited lines of data from specific keyword using python
我有一個文本文件,我需要在其中提取段落中出現指定關鍵字的前五行。
我能夠找到關鍵字,但無法從該關鍵字中寫出接下來的五行。
mylines = []
with open ('D:\\Tasks\\Task_20\\txt\\CV (4).txt', 'rt') as myfile:
for line in myfile:
mylines.append(line)
for element in mylines:
print(element, end='')
print(mylines[0].find("P"))
如果有人對如何做到這一點有任何想法,請提供幫助。
輸入文本文件示例:-
菲律賓合作機構:ALL POWER STAFFING SOLUTIONS, INC.
培訓目標: :在酒店管理領域擁有國際文化接觸和實踐經驗,作為通往有意義的酒店職業生涯的門戶。 發展我的酒店管理技能並具有全球競爭力。
教育機構名稱:SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES 地點 Hom as Pinas City, Philippine 機構開課日期:(2007 年 6 月
需要 Output:-
培訓目標: :在酒店管理領域擁有國際文化接觸和實踐經驗,作為通往有意義的酒店職業生涯的門戶。 發展我的酒店管理技能並具有全球競爭力。
#我必須在文本文件中搜索培訓目標關鍵字,並且發現它應該只寫下 5 行。
如果您只是想提取整個“培訓目標”塊,請查找關鍵字並繼續添加行,直到您找到空行(或其他合適的標記,例如下一個 header)。
(編輯以處理多個文件和關鍵字)
def extract_block(filename, keywords):
mylines = []
with open(filename) as myfile:
save_flag = False
for line in myfile:
if any(line.startswith(kw) for kw in keywords):
save_flag = True
elif line.strip() == '':
save_flag = False
if save_flag:
mylines.append(line)
return mylines
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
keywords = ['keyword1', 'keyword2', 'keyword3']
for filename in filenames:
block = extract_block(filename, keywords)
這假設每個文件中只有 1 個塊。 如果您從每個文件中提取多個塊,它會變得更加復雜。
如果您真的每次都想要 5 行,那么您可以做類似的事情,但添加一個計數器來計算您的 5 行。
嘗試這個:
with open('test.txt') as f:
content = f.readlines()
index = [x for x in range(len(content)) if 'training objectives' in content[x].lower()]
for num in index:
for lines in content[num:num+5]:
print (lines)
如果你只有幾句話(只是為了獲取索引):
index = []
for i, line in enumerate(content):
if 'hello' in line or 'there' in line: //add your or + word here
index.append(i)
print(index)
如果你有很多(只是為了獲得索引):
list = ["hello","there","blink"] //insert your words here
index = []
for i, line in enumerate(content):
for items in list:
if items in line:
index.append(i)
print(index)
這取決於你在哪里\n,但我將一個正則表達式放在一起,這可能有助於我的文本在變量 st 中的外觀示例:
In [254]: st
Out[254]: 'Philippine Partner Agency: ALL POWER STAFFING SOLUTIONS, INC.\n\nTraining Objectives::\nTo have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\nEducation Institution Name: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES Location Hom as Pinas City, Philippine Institution start date: (June 2007\n'
impore re
re.findall('Training Objectives:.*\n((?:.*\n){1,5})', st)
Out[255]: ['To have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\n']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.