[英]How to extract certain data on different lines from file in python
事件的代碼是這樣的,它成功地工作了。但我試圖修改它以吸引其他人,但它不起作用,顯然它不正確。
with open('GroupEvent/G0.txt') as f:
lines = f.readlines()
for i in range(0, len(lines)):
if lines[i] == '\n':
nlines = 0
else:
line = lines[i]
entry=line.split()
for x in entry:
first_char=x
EventToMatch = ('E')
if first_char.startswith(EventToMatch) and nlines == 1 :
Events.append(first_char)
nlines = 2
break
elif nlines==2:
Org.append(first_char)
nlines= 3
elif nlines == 3:
Yes.append(first_char)
nlines =4
elif nlines == 4:
No.append(first_char)
nlines == 0
else:
break
好的,所以我有一個文件,其中有上面的數據,現在第一行帶有 E 的 id 是事件的特定 id,在第二個鏈接上是組織的人員 id,而第三行的 id接受邀請的人,第四個是拒絕的人。 該文件有幾十個這樣的記錄,由一個空行分隔。 我如何收集組織者 ID、說是和否的人的數據? 我輕松地捕獲了事件 id,因為它以 E 開頭,並且我得到了一個事件 id 數組。 現在我無法提取其他人。
如果文件具有一定的結構,我通常使用 class。 例如,就像在 FastQ 文件中一樣。 我將以下幾行放入input_file.txt
並返回 5 行。 你可以用它做任何你想做的事。
輸入文件.txt
E932 4 1240153200000 #id of an event
M48462 #id of organizer
M48462 #id of accepted invite
M65542 #id of rejected invite
E932 4 1240153200000
M48462
M48462
M65542
E932 4 1240153200000
M48462
M48462
M65542
E932 4 1240153200000
M48462
M48462
M65542
處理它的 class 代碼:
class HandleFile:
def __init__(self, filename):
self.input = open(filename,"r") # assuming it is a textfile
self.currentLine = 0
def __iter__(self):
return self
def __next__(self):
mylist = []
for i in range(5): # as it is 5 lines for each
line = self.input.readline()
line = str(line)
self.currentLine += 1
if line:
mylist.append(line.strip("\n"))
else:
mylist.append(None) # add None if it is end of file
if mylist.count(None) == 5: # check if it is the end of line
raise StopIteration
assert mylist[4] == "" # check if the 5th line is empty line
assert mylist[0].startswith("E") # or put more condition
return mylist
hf = HandleFile("input_file.txt")
for lst in hf:
print(lst)
這是 output:
...
['E932 4 1240153200000 #id of an event', 'M48462 #id of organizer', 'M48462 #id of accepted invite', 'M65542 #id of rejected invite', '']
['E932 4 1240153200000', 'M48462', 'M48462', 'M65542', '']
['E932 4 1240153200000', 'M48462', 'M48462', 'M65542', '']
['E932 4 1240153200000', 'M48462', 'M48462', 'M65542', '']
>>>
注意:此代碼已從此處修改
如果您想要的只是特定 ID 的列表。 我過去使用的一種方法是:
#initialize lists of the id's you want
event_id = []
org_id = []
accept_id = []
reject_id = []
#open the file with your data
file = open("FILENAME.txt", "r")
#now read the file
content = file.read()
# now split your file by every blank line by using "\n" twice
# just like when you want a blank space you hit return twice
split_content = content.split("\n\n")
# now what i found easiest for me was to first create a list of lists
# to seperate each section of information on a specific item group listed
mylist = [item.split("\n") for item in split_content]
#now to just append your lists you originaly made in the beginning with
# the content you want assosiated there
for e in mylist:
event_id.append(e[0])
org_id.append(e[1])
accept_id.append(e[2])
reject_id.append(e[3])
# now all your ID's are seperated to there respective lists
# you can also append them to seperate files if you would like with this
file_to_append = open("FILENAME.txt", "a+")
file.write(e[INDEX_OF_ELEMENT])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.