简体   繁体   中英

How to split the file content by space and end-of-line character?

When I do the following list comprehension I end up with nested lists:

channel_values = [x for x in [ y.split(' ') for y in
    open(channel_output_file).readlines() ] if x and not x == '\n']

Basically I have a file composed of this:

7656 7653 7649 7646 7643 7640 7637 7634 7631 7627 7624 7621 7618 7615
8626 8623 8620 8617 8614 8610 8607 8604 8600 8597 8594 8597 8594 4444
<snip several thousand lines>

Where each line of this file is terminated by a new line.

Basically I need to add each number (they are all separated by a single space) into a list.

Is there a better way to do this via list comprehension?

您不需要为此的列表理解:

channel_values = open(channel_output_file).read().split()

Just do this:

channel_values = open(channel_output_file).read().split()

split() will split according to whitespace that includes ' ' '\\t' and '\\n' . It will split all the values into one list.

If you want integer values you can do:

channel_values = map(int, open(channel_output_file).read().split())

or with list comprehensions:

channel_values = [int(x) for x in open(channel_output_file).read().split()]

Also, the reason the original list comprehension had nested lists is because you added an extra level of list comprehension with the inner set of square brackets. You meant this:

channel_values = [x for x in y.split(' ') for y in
    open(channel_output_file) if x and not x == '\n']

The other answers are still better ways to write the code, but that was the cause of the problem.

Well another problem is that you're leaving the file open. Note that open is an alias for file .

try this:

f = file(channel_output_file)
channel_values = f.read().split()
f.close()

Note they'll be string values so if you want integer ones change the second line to

channel_values = [int(x) for x in f.read().split()]

int(x) will throw a ValueError if you have a non integer value in the file.

Is there a better way to do this via list comprehension?

Sort of..

Instead of reading each line as an array, with the .readlines() methods, you can just use .read() :

channel_values = [x for x in open(channel_output_file).readlines().split(' ')
if x not in [' ', '\n']]

If you need to do anything more complicated, particularly if it involves multiple list-comprehensions, you're almost always better of expanding it into a regular for loop.

out = []
for y in open(channel_output_file).readlines():
    for x in y.split(' '):
        if x not in [' ', '\n']:
            out.append(x)

Or using a for loop and a list-comprehension:

out = []
for y in open(channel_output_file).readlines():
    out.extend(
        [x for x in y.split(' ')
        if x != ' ' and x != '\n'])

Basically, if you can't do something simply with a list comprehension (or need to nest them), list-comprehensions are probably not the best solution.

If you don't care about dangling file references, and you really must have a list read into memory all at once, the one-liner mentioned in other answers does work:

channel_values = open(channel_output_path).read().split()

In production code, I would probably use a generator, why read all those lines if you don't need them?

def generate_values_for_filename(filename):
    with open(filename) as f:
        for line in f:
            for value in line.split():
                yield value

You can always make a list later if you really need to do something other than iterate over values:

channel_values = list(generate_values_for_filename(channel_output_path))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM