简体   繁体   中英

.readlines() list in a file not indexing values

I have a txt file with content in the form of lists like this:

[1,2,3,4]
[5,6,7,8]

I've put these lists into a list using the following code:

t = open('filename.txt', 'r+w')
contents = t.readlines()

alist = []

for i in contents:
    alist.append(i)

When I run

alist[0]

I get

[1,2,3,4]

but when I run

for a in alist:
    print a[0]

I get

[

instead of the fist value in the list.

.readlines() reads lines as strings. The first character of that string is a [ .

If you want to read the text file and "deserialize" it into data structures, the easiest way is to use Python's built-in eval() function. A safer way is to use ast.literal_eval() .

http://docs.python.org/2/library/ast.html?highlight=literal#ast.literal_eval

Suggested code:

import ast

with open("filename.txt") as f:
    alist = [ast.literal_eval(line) for line in f]

print(type(alist[0]))  # prints: <type 'list'>
print(alist[0]) # prints: [1,2,3,4]

We almost never want to call .readlines() ; it slurps in all the lines from the file, so if the file is very large, it will cause your program's memory usage to go way up. An open file handle object (in my example, f ) can be used as an iterator, and it will yield up one line from the file each time it is iterated. So a for loop or a list comprehension will pull one line at a time from the file. Thus, this example program does not keep the whole file in memory; it keeps just one line at a time, while building the list. If this program called .readlines() it would keep all the lines and also the list, so the peak memory usage would be higher. (It doesn't matter for such a small input file as this example, of course. But it's easy to do things the memory efficient way, so why not?)

It is always good practice to use with to open a file. Then you know the file will be properly closed when you are done with it.

We use a list comprehension to build a list of the results of ast.literal_eval() , which for the given input file returns a list per line, so alist will be a list of lists.

If you just inherited or downloaded these files and can't do anything about the format, and you know they're supposed to be treated as lines of Python list s, ast.literal_eval is the best answer, as steveha explained:

t = open('filename.txt', 'r')
alist = []    
for i in contents:
    alist.append(ast.literal_eval(i))

If you inherited or downloaded these files, and are just guessing at the format, it's possible that they're actually intended to be read as lines of JSON, because they definitely are valid JSON just as they are valid Python literals. In that case:

t = open('filename.txt', 'r')
alist = []    
for i in contents:
    alist.append(json.loads(i))

But if you're the one who created these files in the first place, you should instead create them in a way that's designed for serialization.

For example, instead of this:

t = open('filename.txt', 'w')
for i in alist:
    print >>t, i

Do something like this:

t = open('filename.txt', 'w')
json.dump(alist, t)

Then you can write your reading code like this:

t = open('filename.txt', 'r')
alist = json.load(t)

The whole point of serialization formats like JSON, YAML, or Pickle is that they're specifically designed so that you can write a value and later read back that same value.

Functions like print , str , etc. are not designed for that; they're designed so you can display a value in the nicest human-readable form, even if that's difficult or impossible to read back later.

The function repr is somewhere in between. It's designed to be readable to humans playing with the interactive prompt, so if possible it gives you a string that you could type into the prompt to get the same value back. This means that, in some cases, ast.literal_eval is the inverse of repr , just as json.load is the inverse of json.dump . But you shouldn't rely on this, even when dealing with types where it works.


A few side notes about your code:

t = open('filename.txt', 'r+w')

If you're only going to read the file, don't try to open it for writing. Also, if you do want to open for both reading and writing, the right mode string is r+ , not r+w . (The way you've done it is technically an error, but most versions of Python will ignore the w , so you get away with it.)

And if the mode is r , you don't need to specify it at all, because that's the default.

Meanwhile, you never close the file. The easiest way to do this is to use a with statement.

contents = t.readlines()

There is almost never a good reason to call readlines() . This gives you a sequence of lines—but the file itself is already a sequence of lines. All you're doing is making an extra copy of it.

alist = []

for i in contents:
    alist.append(i)

This pattern—creating an empty list and then appending to it in a loop—is so common that Python has a shortcut to it, called a list comprehension. Comprehensions are less verbose, more readable, harder to get wrong, and faster than explicit loops, so it's worth using them most of the time.

Finally, it's better to give meaningful names to your variables. Especially if you want someone else (or yourself, six months later) to be able to debug your code. If it's working perfectly, we can tell what the variables mean by what they do—but if it's not, we can't fix it unless we can guess what they're supposed to mean, and names are the best way to signal that.

So, putting it all together, your original code could be written as:

with open('filename.txt') as textfile:
    alist = [line for line in textfile]

And the various fixed versions are:

with open('filename.txt') as textfile:
    alist = [ast.literal_eval(line) for line in textfile]

with open('filename.txt') as textfile:
    alist = [json.loads(line) for line in textfile]

with open('filename.txt') as textfile:
    alist = json.load(textfile)

What you have is a list of character strings. A character string with brackets and commas in it is not magically a list, it is merely a string with brackets and commas in it.

alist is the list. In your loop, a is an item from that list: first, it is alist[0] , then alist[1] and so on. Thus, a[0] is asking for alist[0][0] , alist[1][0] , and so on: the first character from each line. And so that's what you get.

If you want to convert it to an actual Python list, use ast.literal_eval() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM