[英]Creating List From File In Python
該文件包含:
1 19 15 36 23 18 39
2 36 23 4 18 26 9
3 35 6 16 11
從那以后我想提取如下列表:
L = [1,19,15,36,23,18,19,2,36........... ect.]
最有效的方法是什么?
您可以使用itertools.chain,拆分每一行並映射到整數:
from itertools import chain
with open("in.txt") as f:
print(list((map(int,chain.from_iterable(line.split() for line in f)))))
[1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11]
對於python2,使用itertools.imap
而不是map。 使用帶有map的鏈和itertools.chain可以避免一次將所有文件讀入內存,這就是.read
會做的事情。
python3在文件上的某些時間與輸入* 1000相同:
In [5]: %%timeit
with open("ints.txt","r") as f:
list(map(int,re.split(r"\s+",f.read())))
...:
100 loops, best of 3: 8.55 ms per loop
In [6]: %%timeit
with open("ints.txt","r") as f:
list((map(int, chain.from_iterable(line.split() for line in f))))
...:
100 loops, best of 3: 5.76 ms per loop
In [7]: %%timeit
...: with open("ints.txt","r") as f:
...: [int(i) for i in f.read().split()]
...:
100 loops, best of 3: 5.82 ms per loop
所以itertools匹配list comp但使用的內存要少得多。
對於python2:
In [3]: %%timeit
with open("ints.txt","r") as f:
[int(i) for i in f.read().split()]
...:
100 loops, best of 3: 7.79 ms per loop
In [4]: %%timeit
with open("ints.txt","r") as f:
list(imap(int, chain.from_iterable(line.split() for line in f)))
...:
100 loops, best of 3: 8.03 ms per loop
In [5]: %%timeit
with open("ints.txt","r") as f:
list(imap(int,re.split(r"\s+",f.read())))
...:
100 loops, best of 3: 10.6 ms per loop
列表comp稍微快一些,但是再次使用更多內存,如果你要用讀取分割方法讀取所有內存,imap再次是最快的:
In [6]: %%timeit
...: with open("ints.txt","r") as f:
...: list(imap(int, f.read().split()))
...:
100 loops, best of 3: 6.85 ms per loop
對於python3和map也是如此:
In [4]: %%timeit
with open("ints.txt","r") as f:
list(map(int,f.read().split()))
...:
100 loops, best of 3: 4.41 ms per loop
因此,如果速度是你關心的所有使用list(map(int,f.read().split()))
或list(imap(int,f.read().split()))
方法。
如果記憶也是一個問題,請將它與鏈條結合起來。 鏈接方法的另一個優點是,如果您將內存傳遞給函數或迭代,您可以直接傳遞鏈對象,因此您根本不需要將所有數據保存在內存中。
最后一個小優化是在文件對象上映射str.split:
In [5]: %%timeit
with open("ints.txt", "r") as f:
list((map(int, chain.from_iterable(map(str.split, f)))))
...:
100 loops, best of 3: 5.32 ms per loop
with open('yourfile.txt') as f:
your_list = f.read().split()
將它強制轉換為整數。 您可以使用列表compregension:
your_list = [int(i) for i in f.read().split()]
當無法輸出值時,這可能會導致異常。
f=open("output.txt","r")
import re
print map(int,re.split(r"\s+",f.read()))
f.close()
您可以使用re.split
,它將返回一個列表並map
到int
。
如果您可以使用numpy
庫,另一種方法是使用np.fromstring()
將文件的.read()
作為輸入,示例 -
import numpy as np
with open('file.txt','r') as f:
lst = np.fromstring(f.read(),sep=' ',dtype=int)
最后lst
將是一個numpy數組,如果你想要一個python列表,使用list(lst)
numpy.fromstring()
始終返回一維數組,當您將空格作為分隔符時,它將忽略額外的空格,其中包括換行符。
示例/演示 -
In [39]: import numpy as np
In [40]: with open('a.txt','r') as f:
....: lst = np.fromstring(f.read(),sep=' ',dtype=int)
....:
In [41]: lst
Out[41]:
array([ 1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6,
16, 11])
In [42]: list(lst)
Out[42]: [1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11]
性能測試 -
In [47]: def func1():
....: with open('a.txt','r') as f:
....: lst = np.fromstring(f.read(),sep=' ',dtype=int)
....: return list(lst)
....:
In [37]: def func2():
....: with open('a.txt','r') as f:
....: return list((map(int,chain.from_iterable(line.split() for line in f))))
....:
In [54]: def func3():
....: with open('a.txt','r') as f:
....: return np.fromstring(f.read(),sep=' ',dtype=int)
....:
In [55]: %timeit func3()
10000 loops, best of 3: 183 µs per loop
In [56]: %timeit func1()
10000 loops, best of 3: 194 µs per loop
In [57]: %timeit func2()
10000 loops, best of 3: 212 µs per loop
如果你對numpy.ndarray
(與列表沒有什么不同)沒問題,那會更快。
你可以使用re.findall
。
import re
with open(file) as f:
print map(int, re.findall(r'\d+', f.read()))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.