將txt文件按符號python3分成列表

Question

我有一個txt文件，大約1-10MB，如下所示：

"Type,int_data,ID,..., some data"

我想按ID分開。 例如，執行以下操作：

list_1=[]
list_2=[]
list_3=[]
.. 
list_7=[]

with open(txt,'r', encoding='utf-8') as txt:                        
    for string in txt:
        string=string.rstrip().split(',')
        ID=int(string[2])
        if ID==1:
            list_1.append(string)
        elif ID==2:
            list_2.append(string)
            ..

但這很慢..會更好嗎？

Answer 1

這個怎么樣？ 可能不會那么快，但請嘗試一下，讓我知道！

from collections import defaultdict
res = defaultdict(list) #making a dict of lists where ID is the key
with open(txt,'r', encoding='utf-8') as txt:                        
    for string in txt:
        res[string.split(',')[2]].append(string) #appending the lines to the ID key

Answer 2

這是我之前在100mb +文件上使用過的代碼片段（我不是作者）。 不知道這是否對您的文件大小有所幫助，還是所有開銷過多。 基本上，它是如何工作的，首先將文件分成字符塊（chunkify），然后為每個塊生成一個將從該塊的開始到結尾讀取的作業。 然后將這些作業分發到您的線程池，以便您可以使用所有內核，同時可以有效地發送多少次從它們接收數據。

對於您的情況，只需為'process_wrapper'添加一個'process'函數即可用於每行，就像@Keerthi Bachu所擁有的一樣。

這可能行得通，或者會給您一些啟發。

import multiprocessing as mp,os

def process_wrapper(chunkStart, chunkSize):
    with open("input.txt") as f:
        f.seek(chunkStart)
        lines = f.read(chunkSize).splitlines()
        for line in lines:
            process(line)

def chunkify(fname,size=1024*1024):
    fileEnd = os.path.getsize(fname)
    with open(fname,'r') as f:
        chunkEnd = f.tell()
    while True:
        chunkStart = chunkEnd
        f.seek(size,1)
        f.readline()
        chunkEnd = f.tell()
        yield chunkStart, chunkEnd - chunkStart
        if chunkEnd > fileEnd:
            break

#init objects
pool = mp.Pool(cores)
jobs = []

#create jobs
for chunkStart,chunkSize in chunkify("input.txt"):
    jobs.append( pool.apply_async(process_wrapper,(chunkStart,chunkSize)) )

#wait for all jobs to finish
for job in jobs:
    job.get()

#clean up
pool.close()

將txt文件按符號python3分成列表

問題描述

2 個解決方案

解決方案1
1 2018-04-14 11:29:52

解決方案2
1 已采納 2018-04-14 11:54:57

將txt文件按符號python3分成列表

問題描述

2 個解決方案

解決方案1 1 2018-04-14 11:29:52

解決方案2 1 已采納 2018-04-14 11:54:57

解決方案1
1 2018-04-14 11:29:52

解決方案2
1 已采納 2018-04-14 11:54:57