I hope you could answer my question. I am new to python so I ask your help. I want to open a file that contains the following lines. I would like to read each line and store every charaster of it as a string to a list.
A B 2
A E 2
A W 1
B D 5
B W 4
B C 2
B F 3
C F 7
C V 9
D E 1
D J 7
E K 3
F L 2
F M 7
F R 3
F Y 1
G K 8
G J 5
I want to store information about each line like this: [AB 2],[AE 2] will be ['A','B','2'],['A','E','2']
You can do the following:
with open('testfile.txt') as fp:
content = [elem
for line in fp.readlines()
for elem in [line.split()]
if elem]
print(content)
This yields
[['A', 'B', '2'], ['A', 'E', '2'], ['A', 'W', '1'], ['B', 'D', '5'], ['B', 'W', '4'], ['B', 'C', '2'], ['B', 'F', '3'], ['C', 'F', '7'], ['C', 'V', '9'], ['D', 'E', '1'], ['D', 'J', '7'], ['E', 'K', '3'], ['F', 'L', '2'], ['F', 'M', '7'], ['F', 'R', '3'], ['F', 'Y', '1'], ['G', 'K', '8'], ['G', 'J', '5']]
Alternatively, as an explicit loop:
data = []
with open(filename) as f:
for line in f:
line = line.rstrip()
if line == '':
continue
data.append(line.split())
I compared the proposals in here (3 with list comprehension and another 3 with for loop iteration and appending to a list):
def f_jan(filename):
with open(filename) as f:
return [
elem
for line in f.readlines()
for elem in [line.split()]
if elem]
def f_mateen_ulhaq_1(filename):
with open(filename) as f:
return [
elem.split()
for elem in map(str.rstrip, f)
if elem]
def f_ralf_1(filename):
with open(filename) as f:
return [
line.split()
for line in f
if line != '\n']
def f_mateen_ulhaq_2(filename):
data = []
with open(filename) as f:
for line in f:
line = line.rstrip()
if line == '':
continue
data.append(line.split())
return data
def f_mateen_ulhaq_3(filename):
data = []
with open(filename) as f:
for line in f:
if line == '\n':
continue
data.append(line.split())
return data
def f_ralf_2(filename):
data = []
with open(filename) as f:
for line in f:
if line != '\n':
data.append(line.split())
return data
I created 2 files, one with 100 lines of the sample input provided in the question, and another file with 100.000 lines of the same input.
I tested that they all return the same data:
filename_1 = 'test_100_lines.txt'
assert (f_jan(filename_1)
== f_mateen_ulhaq_1(filename_1)
== f_ralf_1(filename_1)
== f_mateen_ulhaq_2(filename_1)
== f_mateen_ulhaq_3(filename_1)
== f_ralf_2(filename_1))
Then, using timeit
, I compared the speed (using a smaller number of repetitions for the large text file):
for fn, number in[
('test_100_lines.txt', 10000),
('test_100000_lines.txt', 100),
]:
for func in [
f_jan,
f_mateen_ulhaq_1,
f_ralf_1,
f_mateen_ulhaq_2,
f_mateen_ulhaq_3,
f_ralf_2,
]:
t = timeit.timeit('func(fn)', 'from __main__ import fn, func', number=number)
print('{:25s} {:20s} {:10.4f} seconds'.format(fn, func.__name__, t))
The fastest solution for small and big input is f_ralf_1
(list comprehension without .strip()
, just comparing against \\n
):
test_100_lines.txt f_jan 0.5019 seconds
test_100_lines.txt f_mateen_ulhaq_1 0.4483 seconds
test_100_lines.txt f_ralf_1 0.3657 seconds
test_100_lines.txt f_mateen_ulhaq_2 0.4523 seconds
test_100_lines.txt f_mateen_ulhaq_3 0.3854 seconds
test_100_lines.txt f_ralf_2 0.3886 seconds
test_100000_lines.txt f_jan 3.1178 seconds
test_100000_lines.txt f_mateen_ulhaq_1 2.6396 seconds
test_100000_lines.txt f_ralf_1 1.8084 seconds
test_100000_lines.txt f_mateen_ulhaq_2 2.7143 seconds
test_100000_lines.txt f_mateen_ulhaq_3 2.0398 seconds
test_100000_lines.txt f_ralf_2 2.0246 seconds
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.