在Python中從文件創建列表

Question

該文件包含：

1 19 15 36 23 18 39 
2 36 23 4 18 26 9
3 35 6 16 11

從那以后我想提取如下列表：

L = [1,19,15,36,23,18,19,2,36........... ect.]

最有效的方法是什么？

Answer 1

您可以使用itertools.chain，拆分每一行並映射到整數：

from itertools import chain
with open("in.txt") as f:
    print(list((map(int,chain.from_iterable(line.split() for line in f)))))
[1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11]

對於python2，使用itertools.imap而不是map。 使用帶有map的鏈和itertools.chain可以避免一次將所有文件讀入內存，這就是.read會做的事情。

python3在文件上的某些時間與輸入* 1000相同：

In [5]: %%timeit
with open("ints.txt","r") as f:
    list(map(int,re.split(r"\s+",f.read())))
   ...: 
100 loops, best of 3: 8.55 ms per loop

In [6]: %%timeit                                                
with open("ints.txt","r") as f:
    list((map(int, chain.from_iterable(line.split() for line in f))))
   ...: 
100 loops, best of 3: 5.76 ms per loop

In [7]: %%timeit
...: with open("ints.txt","r") as f:
...:      [int(i) for i in f.read().split()]
...: 
100 loops, best of 3: 5.82 ms per loop

所以itertools匹配list comp但使用的內存要少得多。

對於python2：

In [3]: %%timeit                                                
with open("ints.txt","r") as f:
     [int(i) for i in f.read().split()]
   ...: 
100 loops, best of 3: 7.79 ms per loop

In [4]: %%timeit                                                
with open("ints.txt","r") as f:
    list(imap(int, chain.from_iterable(line.split() for line in f)))
   ...: 
100 loops, best of 3: 8.03 ms per loop

In [5]: %%timeit                                                
with open("ints.txt","r") as f:
    list(imap(int,re.split(r"\s+",f.read())))
   ...: 
100 loops, best of 3: 10.6 ms per loop

列表comp稍微快一些，但是再次使用更多內存，如果你要用讀取分割方法讀取所有內存，imap再次是最快的：

In [6]: %%timeit
   ...: with open("ints.txt","r") as f:
   ...:      list(imap(int, f.read().split()))
   ...: 
100 loops, best of 3: 6.85 ms per loop

對於python3和map也是如此：

In [4]: %%timeit                                                
with open("ints.txt","r") as f:
     list(map(int,f.read().split()))
   ...: 
100 loops, best of 3: 4.41 ms per loop

因此，如果速度是你關心的所有使用list(map(int,f.read().split()))或list(imap(int,f.read().split()))方法。
如果記憶也是一個問題，請將它與鏈條結合起來。 鏈接方法的另一個優點是，如果您將內存傳遞給函數或迭代，您可以直接傳遞鏈對象，因此您根本不需要將所有數據保存在內存中。

最后一個小優化是在文件對象上映射str.split：

In [5]: %%timeit
with open("ints.txt", "r") as f:
    list((map(int, chain.from_iterable(map(str.split, f)))))
   ...: 
100 loops, best of 3: 5.32 ms per loop

Answer 2

with open('yourfile.txt') as f:
    your_list = f.read().split()

將它強制轉換為整數。 您可以使用列表compregension：

your_list = [int(i) for i in f.read().split()]

當無法輸出值時，這可能會導致異常。

Answer 3

f=open("output.txt","r")
import re
print map(int,re.split(r"\s+",f.read()))
f.close()

您可以使用re.split ，它將返回一個列表並map到int 。

Answer 4

如果您可以使用numpy庫，另一種方法是使用np.fromstring()將文件的.read()作為輸入，示例 -

import numpy as np
with open('file.txt','r') as f:
    lst = np.fromstring(f.read(),sep=' ',dtype=int)

最后lst將是一個numpy數組，如果你想要一個python列表，使用list(lst)

numpy.fromstring()始終返回一維數組，當您將空格作為分隔符時，它將忽略額外的空格，其中包括換行符。

示例/演示 -

In [39]: import numpy as np

In [40]: with open('a.txt','r') as f:
   ....:     lst = np.fromstring(f.read(),sep=' ',dtype=int)
   ....:

In [41]: lst
Out[41]:
array([ 1, 19, 15, 36, 23, 18, 39,  2, 36, 23,  4, 18, 26,  9,  3, 35,  6,
       16, 11])

In [42]: list(lst)
Out[42]: [1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11]

性能測試 -

In [47]: def func1():
   ....:     with open('a.txt','r') as f:
   ....:         lst = np.fromstring(f.read(),sep=' ',dtype=int)
   ....:         return list(lst)
   ....:
In [37]: def func2():
   ....:     with open('a.txt','r') as f:
   ....:         return list((map(int,chain.from_iterable(line.split() for line in f))))
   ....:

In [54]: def func3():
   ....:     with open('a.txt','r') as f:
   ....:         return np.fromstring(f.read(),sep=' ',dtype=int)
   ....:

In [55]: %timeit func3()
10000 loops, best of 3: 183 µs per loop

In [56]: %timeit func1()
10000 loops, best of 3: 194 µs per loop

In [57]: %timeit func2()
10000 loops, best of 3: 212 µs per loop

如果你對numpy.ndarray （與列表沒有什么不同）沒問題，那會更快。

Answer 5

你可以使用re.findall 。

import re
with open(file) as f:
    print map(int, re.findall(r'\d+', f.read()))

在Python中從文件創建列表

問題描述

5 個解決方案

解決方案1
5 2015-08-08 12:19:39

解決方案2
3 2015-08-08 12:21:25

解決方案3
2 2015-08-08 12:21:13

解決方案4
1 2015-08-08 12:45:38

解決方案5
0 2015-08-08 12:26:38

在Python中從文件創建列表

問題描述

5 個解決方案

解決方案1 5 2015-08-08 12:19:39

解決方案2 3 2015-08-08 12:21:25

解決方案3 2 2015-08-08 12:21:13

解決方案4 1 2015-08-08 12:45:38

解決方案5 0 2015-08-08 12:26:38

解決方案1
5 2015-08-08 12:19:39

解決方案2
3 2015-08-08 12:21:25

解決方案3
2 2015-08-08 12:21:13

解決方案4
1 2015-08-08 12:45:38

解決方案5
0 2015-08-08 12:26:38