简体   繁体   中英

Reading from text file into python list

Very new to python and can't understand why this isn't working. I have a list of web addresses stored line by line in a text file. I want to store the first 10 in an array/list called bing, the next 10 in a list called yahoo, and the last 10 in a list called duckgo. I'm using the readlines function to read the data from the file into each array. The problem is nothing is being written to the lists. The count is incrementing like it should. Also, if I remove the loops altogether and just read the whole text file into one list it works perfectly. This leads me to believe that the loops are causing the problem. The code I am using is below. Would really appreciate some feedback.


#Open the file

#read into each array
        bing = fo.readlines()
        print bing
        print count

    elif(count>=10 and count<=19):
        yahoo = fo.readlines()
        print count

    elif(count>=20 and count<=29):
        duckgo = fo.readlines()
        print count

print bing
print yahoo
print duckgo


You're using readlines to read the files. readlines reads all of the lines at once, so the very first time through your loop, you exhaust the entire file and store the result in bing . Then, every time through the loop, you overwrite bing , yahoo , or duckgo with the (empty) result of the next readlines call. So your lists all wind up being empty.

There are lots of ways to fix this. Among other things, you should consider reading the file a line at a time, with readline (no 's'). Or better yet, you could iterate over the file, line by line, simply by using a for loop:

for line in fo:

To keep the structure of your current code you could use enumerate :

for line_number, line in enumerate(fo):
    if condition(line_number):

But frankly I think you should ditch your current system. A much simpler way would be to use readlines without a loop, and slice the resulting list!

lines = fo.readlines()
bing = lines[0:10]
yahoo = lines[10:20]
duckgo = lines[20:30]

There are many other ways to do this, and some might be better, but none are simpler!

readlines() reads all of the lines of the file. If you call it again, you get empty list. So you are overwriting your lists with empty data when you iterate through your loop.

You should be using readline() instead of readlines()

readlines() reads the entire file in at once, whereas readline() reads a single line from the file.

I suggest you rewrite it like so:

bing = []
yahoo = []
duckgo = []
with open("results.txt", "r") as f:
    for i, line in enumerate(f):
        if i < 10:
        elif i < 20:
        elif i < 30:
            raise RuntimeError, "too many lines in input file"

Note how we use enumerate() to get a running count of lines, rather than making our own count variable and needing to increment it ourselves. This is considered good style in Python.

But I think the best way to solve this problem would be to use itertools like so:

import itertools as it
with open("results.txt", "r") as f:
    bing = list(it.islice(f, 10))
    yahoo = list(it.islice(f, 10)) 
    duckgo = list(it.islice(f, 10))
    if list(it.islice(f, 1)):
        raise RuntimeError, "too many lines in input file"

itertools.islice() (or it.islice() since I did the import itertools as it ) will pull a specified number of items from an iterator. Our open file-handle object f is an iterator that returns lines from the file, so it.islice(f, 10) pulls exactly 10 lines from the input file.

Because it.islice() returns an iterator, we must explicitly expand it out to a list by wrapping it in list() .

I think this is the simplest way to do it. It perfectly expresses what we want: for each one, we want a list with 10 lines from the file. There is no need to keep a counter at all, just pull the 10 lines each time!

EDIT: The check for extra lines now uses it.islice(f, 1) so that it will only pull a single line. Even one extra line is enough to know that there are more than just the 30 expected lines, and this way if someone accidentally runs this code on a very large file, it won't try to slurp the whole file into memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM