簡體   English   中英

如何從 python 中的文件中提取不同行的某些數據

[英]How to extract certain data on different lines from file in python

在此處輸入圖像描述

事件的代碼是這樣的,它成功地工作了。但我試圖修改它以吸引其他人,但它不起作用,顯然它不正確。

with open('GroupEvent/G0.txt') as f:
lines = f.readlines()
for i in range(0, len(lines)):
    if lines[i] == '\n':
        nlines = 0
    else:
        line = lines[i]
        entry=line.split()
        for x in entry:
            first_char=x
            EventToMatch = ('E')
            if first_char.startswith(EventToMatch) and nlines == 1 :
              Events.append(first_char)
              nlines = 2
              break
            elif nlines==2:
              Org.append(first_char)
              nlines= 3
              
            elif nlines == 3:
              Yes.append(first_char)
              nlines =4
              

            elif nlines == 4:
              No.append(first_char)
              nlines == 0
              
            else:
             break

好的,所以我有一個文件,其中有上面的數據,現在第一行帶有 E 的 id 是事件的特定 id,在第二個鏈接上是組織的人員 id,而第三行的 id接受邀請的人,第四個是拒絕的人。 該文件有幾十個這樣的記錄,由一個空行分隔。 我如何收集組織者 ID、說是和否的人的數據? 我輕松地捕獲了事件 id,因為它以 E 開頭,並且我得到了一個事件 id 數組。 現在我無法提取其他人。

如果文件具有一定的結構,我通常使用 class。 例如,就像在 FastQ 文件中一樣。 我將以下幾行放入input_file.txt並返回 5 行。 你可以用它做任何你想做的事。

輸入文件.txt

E932 4 1240153200000 #id of an event
M48462 #id of organizer
M48462 #id of accepted invite
M65542 #id of rejected invite

E932 4 1240153200000
M48462
M48462
M65542

E932 4 1240153200000
M48462
M48462
M65542

E932 4 1240153200000
M48462
M48462
M65542

處理它的 class 代碼:

class HandleFile:
    def __init__(self, filename):
        self.input = open(filename,"r") # assuming it is a textfile
        self.currentLine = 0
    def __iter__(self):
        return self
    def __next__(self):
        mylist = []
        for i in range(5): # as it is 5 lines for each
            line = self.input.readline()
            line = str(line)
            self.currentLine += 1
            if line:
                mylist.append(line.strip("\n"))
            else:
                mylist.append(None) # add None if it is end of file
        if mylist.count(None) == 5: # check if it is the end of line
            raise StopIteration
        assert mylist[4] == "" # check if the 5th line is empty line
        assert mylist[0].startswith("E") # or put more condition
        return mylist

hf = HandleFile("input_file.txt")
for lst in hf:
    print(lst)

這是 output:

...
['E932 4 1240153200000 #id of an event', 'M48462 #id of organizer', 'M48462 #id of accepted invite', 'M65542 #id of rejected invite', '']
['E932 4 1240153200000', 'M48462', 'M48462', 'M65542', '']
['E932 4 1240153200000', 'M48462', 'M48462', 'M65542', '']
['E932 4 1240153200000', 'M48462', 'M48462', 'M65542', '']
>>>

注意:此代碼已從此處修改

如果您想要的只是特定 ID 的列表。 我過去使用的一種方法是:

#initialize lists of the id's you want

event_id = []
org_id = []
accept_id = []
reject_id = []

#open the file with your data

file = open("FILENAME.txt", "r")

#now read the file

content = file.read()

# now split your file by every blank line by using "\n" twice 
# just like when you want a blank space you hit return twice

split_content = content.split("\n\n")

# now what i found easiest for me was to first create a list of lists
# to seperate each section of information on a specific item group listed

mylist = [item.split("\n") for item in split_content]

#now to just append your lists you originaly made in the beginning with
# the content you want assosiated there

for e in mylist:
    event_id.append(e[0])
    org_id.append(e[1])
    accept_id.append(e[2])
    reject_id.append(e[3])

# now all your ID's are seperated to there respective lists
# you can also append them to seperate files if you would like with this

file_to_append = open("FILENAME.txt", "a+")
file.write(e[INDEX_OF_ELEMENT])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM