简体   繁体   中英

Creating List From File In Python

The file contains:

1 19 15 36 23 18 39 
2 36 23 4 18 26 9
3 35 6 16 11

From that I'd like to extract list as follows:

L = [1,19,15,36,23,18,19,2,36........... ect.]

What is the most efficient way to do so?

You can use itertools.chain, splitting each line and mapping to ints:

from itertools import chain
with open("in.txt") as f:
    print(list((map(int,chain.from_iterable(line.split() for line in f)))))
[1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11]

For python2 use itertools.imap instead of map. using chain with map and itertools.chain avoids reading all the file into memory at once which is what .read will do.

Some timings for python3 on a file the same as your input * 1000:

In [5]: %%timeit
with open("ints.txt","r") as f:
    list(map(int,re.split(r"\s+",f.read())))
   ...: 
100 loops, best of 3: 8.55 ms per loop

In [6]: %%timeit                                                
with open("ints.txt","r") as f:
    list((map(int, chain.from_iterable(line.split() for line in f))))
   ...: 
100 loops, best of 3: 5.76 ms per loop

In [7]: %%timeit
...: with open("ints.txt","r") as f:
...:      [int(i) for i in f.read().split()]
...: 
100 loops, best of 3: 5.82 ms per loop

So itertools matches the list comp but uses a lot less memory.

For python2:

In [3]: %%timeit                                                
with open("ints.txt","r") as f:
     [int(i) for i in f.read().split()]
   ...: 
100 loops, best of 3: 7.79 ms per loop

In [4]: %%timeit                                                
with open("ints.txt","r") as f:
    list(imap(int, chain.from_iterable(line.split() for line in f)))
   ...: 
100 loops, best of 3: 8.03 ms per loop

In [5]: %%timeit                                                
with open("ints.txt","r") as f:
    list(imap(int,re.split(r"\s+",f.read())))
   ...: 
100 loops, best of 3: 10.6 ms per loop

The list comp is marginally faster but again uses more memory, if you were going to read all into memory with the read split approach imap is again the fastest:

In [6]: %%timeit
   ...: with open("ints.txt","r") as f:
   ...:      list(imap(int, f.read().split()))
   ...: 
100 loops, best of 3: 6.85 ms per loop

Same for python3 and map:

In [4]: %%timeit                                                
with open("ints.txt","r") as f:
     list(map(int,f.read().split()))
   ...: 
100 loops, best of 3: 4.41 ms per loop

So if speed is all you care about use the list(map(int,f.read().split())) or list(imap(int,f.read().split())) approach.
If memory is also a concern combine it with chain. Another advantage to the chain approach if memory is a concern is if you are passing the ints to a function or iterating over you can pass the chain object directly so you don't need to keep all the data in memory at all.

One last small optimisation is to map str.split on the file object:

In [5]: %%timeit
with open("ints.txt", "r") as f:
    list((map(int, chain.from_iterable(map(str.split, f)))))
   ...: 
100 loops, best of 3: 5.32 ms per loop
with open('yourfile.txt') as f:
    your_list = f.read().split()

To cast it to an integer. You can use a list compregension:

your_list = [int(i) for i in f.read().split()]

This might result in exception when the value can not be casted.

f=open("output.txt","r")
import re
print map(int,re.split(r"\s+",f.read()))
f.close()

You can use re.split which will return a list and map to int .

If you are okay with using numpy library ,another method would be to use np.fromstring() giving the file's .read() as input to it , Example -

import numpy as np
with open('file.txt','r') as f:
    lst = np.fromstring(f.read(),sep=' ',dtype=int)

At the end lst would be a numpy array, if you want a python list, use list(lst)

numpy.fromstring() always returns a 1D array , and when you give space as the separator , it will ignore extra whitespaces, which include newlines.


Example/Demo -

In [39]: import numpy as np

In [40]: with open('a.txt','r') as f:
   ....:     lst = np.fromstring(f.read(),sep=' ',dtype=int)
   ....:

In [41]: lst
Out[41]:
array([ 1, 19, 15, 36, 23, 18, 39,  2, 36, 23,  4, 18, 26,  9,  3, 35,  6,
       16, 11])

In [42]: list(lst)
Out[42]: [1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11]

Performance testing -

In [47]: def func1():
   ....:     with open('a.txt','r') as f:
   ....:         lst = np.fromstring(f.read(),sep=' ',dtype=int)
   ....:         return list(lst)
   ....:
In [37]: def func2():
   ....:     with open('a.txt','r') as f:
   ....:         return list((map(int,chain.from_iterable(line.split() for line in f))))
   ....:

In [54]: def func3():
   ....:     with open('a.txt','r') as f:
   ....:         return np.fromstring(f.read(),sep=' ',dtype=int)
   ....:

In [55]: %timeit func3()
10000 loops, best of 3: 183 µs per loop

In [56]: %timeit func1()
10000 loops, best of 3: 194 µs per loop

In [57]: %timeit func2()
10000 loops, best of 3: 212 µs per loop

If you are okay with numpy.ndarray (which is not that different from list) , that would be faster.

You may use re.findall .

import re
with open(file) as f:
    print map(int, re.findall(r'\d+', f.read()))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM