在Python中讀取輸入的最快方法

Question

我想讀一個包含整數列表列表的大文本文件。 現在我正在做以下事情：

G = []
with open("test.txt", 'r') as f:
    for line in f:
        G.append(list(map(int,line.split())))

但是，它需要大約17秒（通過時間）。 有沒有辦法減少這個時間？ 也許，有一種方法不使用地圖。

Answer 1

numpy具有loadtxt和genfromtxt的功能，但兩者都不是特別快。 在一個廣泛分布的庫最快的文本閱讀器是read_csv在功能pandas （ http://pandas.pydata.org/ ）。 在我的計算機上，每行讀取500萬行包含兩個整數的行使用numpy.loadtxt需要大約46秒，使用numpy.loadtxt需要26秒，使用numpy.genfromtxt 1秒多pandas.read_csv 。

這是顯示結果的會話。 （這是在Linux上，Ubuntu 12.04 64位。你在這里看不到它，但是在每次讀取文件后，通過運行sync; echo 3 > /proc/sys/vm/drop_caches清除磁盤緩存sync; echo 3 > /proc/sys/vm/drop_caches在一個單獨的貝殼。）

In [1]: import pandas as pd

In [2]: %timeit -n1 -r1 loadtxt('junk.dat')
1 loops, best of 1: 46.4 s per loop

In [3]: %timeit -n1 -r1 genfromtxt('junk.dat')
1 loops, best of 1: 26 s per loop

In [4]: %timeit -n1 -r1 pd.read_csv('junk.dat', sep=' ', header=None)
1 loops, best of 1: 1.12 s per loop

Answer 2

基於numpy pandas有一個基於C的文件解析器，速度非常快：

# generate some integer data (5 M rows, two cols) and write it to file
In [24]: data = np.random.randint(1000, size=(5 * 10**6, 2))

In [25]: np.savetxt('testfile.txt', data, delimiter=' ', fmt='%d')

# your way
In [26]: def your_way(filename):
   ...:     G = []
   ...:     with open(filename, 'r') as f:
   ...:         for line in f:
   ...:             G.append(list(map(int, line.split(','))))
   ...:     return G        
   ...: 

In [26]: %timeit your_way('testfile.txt', ' ')
1 loops, best of 3: 16.2 s per loop

In [27]: %timeit pd.read_csv('testfile.txt', delimiter=' ', dtype=int)
1 loops, best of 3: 1.57 s per loop

因此， pandas.read_csv需要大約一秒半的時間來讀取您的數據，並且比您的方法快10倍。

Answer 3

作為一般經驗法則（幾乎任何語言），使用read()讀取整個文件比一次讀取一行更快。 如果您不受內存約束，請立即讀取整個文件，然后在換行符上拆分數據，然后遍歷行列表。

Answer 4

最簡單的加速是去PyPy http://pypy.org/

下一個問題是根本不讀取文件（如果可能的話）。 而是像流一樣處理它。

Answer 5

列表理解通常更快。

G = [[int(item) item in line.split()] for line in f]

除此之外，嘗試PyPy和Cython以及numpy

Answer 6

您也可以嘗試通過批量插入將數據導入數據庫，然后使用set操作處理記錄。 根據您的操作，可能會更快，因為批量插入軟件針對此類任務進行了優化。

在Python中讀取輸入的最快方法

問題描述

6 個解決方案

解決方案1
24 2013-02-26 19:26:07

解決方案2
5 2013-02-26 19:46:51

解決方案3
1 2013-02-26 18:28:49

解決方案4
0 2013-02-26 18:19:17

解決方案5
0 2013-02-26 18:20:01

解決方案6
0 2013-02-26 18:26:31

在Python中讀取輸入的最快方法

問題描述

6 個解決方案

解決方案1 24 2013-02-26 19:26:07

解決方案2 5 2013-02-26 19:46:51

解決方案3 1 2013-02-26 18:28:49

解決方案4 0 2013-02-26 18:19:17

解決方案5 0 2013-02-26 18:20:01

解決方案6 0 2013-02-26 18:26:31

解決方案1
24 2013-02-26 19:26:07

解決方案2
5 2013-02-26 19:46:51

解決方案3
1 2013-02-26 18:28:49

解決方案4
0 2013-02-26 18:19:17

解決方案5
0 2013-02-26 18:20:01

解決方案6
0 2013-02-26 18:26:31