使用python提高讀寫txt文件的性能

Question

我使用python管理txt文件，讀寫多次。 但是如果我有 1000 個 txt 文件，我的代碼需要很長時間來管理這些文件。

如何提高管理這些文件的性能？

我有包含此信息的文件：

position   temperature
0,0        30,10
1,0        45,12
2,0        20,45 (...)

首先，我需要刪除帶字符串的行。 對於 taht 我搜索字符串並創建新的 txt 文件並將沒有帶字符串的行的信息復制到新的 txt 文件中。 我明白了：

0,0       30,10
1,0       45,12
2,0       20,45 (...)

然后我將 , 替換為。 在所有文件中，再次創建一個新文件並處理指向這些新文件的信息。 我明白了：

0.0      30.10
1.0      45.12
2.0      20.45 (...)

然后我需要用最小位置值 (a) 和最大值 (b) 來限制信息。 所以在第一列中，我只想要 a 和 b 之間的線。 我再次創建新文件並將我想要的信息復制到這些文件中。

最后，我向每個文件添加了一個新列，其中包含一些信息。

所以，我認為我的代碼中的耗時與我創建新文件、復制信息和用這些新文件替換舊文件的時間有關。

也許這一切都可以一步完成。 我只需要知道是否有可能。

謝謝

Answer 1

您無需創建新文件只是為了刪除第一行或將逗號替換為行中的點。 您可以在內存中完成所有工作，即從文件中讀取數據，將逗號替換為點，將值轉換為浮點數，對它們進行排序，修剪最小值和最大值並將結果寫入文件，如下所示：

input_file = open('input_file', 'r')
data = []
input_file.readline() # first line with titles
for line in input_file: # lines with data
    data.append(map(lambda x: float(x.replace(',', '.'), line.split()))
input_file.close()

data.sort(key=lambda x: x[1])
data = data[1:-1]

result_file = open('result_file', 'w')
result_file.writelines(['\t'.join(row) for row in data])

result_file.close()

Answer 2

你肯定是在以艱難的方式做這件事......你的問題描述中缺少很多細節信息，但假設有一些事情 - 比如標題總是在第一行，“位置”和“溫度”應該是成為浮點數等 - 這是一個代碼示例，主要完成您在單次傳遞中描述的內容：

import sys
from itertools import ifilter

def parse(path):
    with open(path) as f:
        # skip the headers
        f.next()

        # parse data
        for lineno, line in enumerate(f, 1):
            try:
                position, temperature = line.split()
                position = float(position.replace(",", "."))
                temperature = float(temperature.replace(",","."))
            except ValueError as e:
                raise ValueError("Invalid line at %s:#%s" % (path, lineno))

            yield position, temperature


def process(inpath, outpath, minpos, maxpos):
    # warning: exact comparisons on floating points numbers
    # are not safe
    filterpos = lambda r: minpos <= r[0] <= maxpos 

    with open(outpath, "w") as outfile:
        for pos, temp in ifilter(filterpos, parse(inpath)):
            extra_col = compute_something_from(pos, temp)
            out.writeline("{}\t{}\t{}\t{}\n".format(pos, temp, extra_col))


def compute_something_from(pos, temp):
    # anything
    return pos * (temp / 3.)


def main(*args):
    # TODO : 
    # - clean options / args handling
    # - outfile naming ? 
    minpos = float(args[0])
    maxpos = float(args[1])
    for inpath in args[2:]:
        outpath = inpath + ".tmp"
        process(inpath, outpath, minpos, maxpos)

if __name__ == "__main__":
    main(*sys.argv[1:])

使用python提高讀寫txt文件的性能

問題描述

2 個解決方案

解決方案1
1 2015-10-05 12:08:28

解決方案2
-1 2015-10-05 12:09:46

使用python提高讀寫txt文件的性能

問題描述

2 個解決方案

解決方案1 1 2015-10-05 12:08:28

解決方案2 -1 2015-10-05 12:09:46

解決方案1
1 2015-10-05 12:08:28

解決方案2
-1 2015-10-05 12:09:46