简体   繁体   中英

Removes white spaces while reading in a file

with open(filename, "r") as f:
    for line in f:
        line = (' '.join(line.strip().split())).split()

Can anyone break down the line where whitespaces get removed? I understand line.strip().split() first removes leading and trailing spaces from line then the resulting string gets split on whitespaces and stores all words in a list.

But what does the remaining code do?

The line ' '.join(line.strip().split()) creates a string consisting of all the list elements separated by exactly one whitespace character. Applying split() method on this string again returns a list containing all the words in the string which were separated by a whitespace character.

Here's a breakdown:

# Opens the file
with open(filename, "r") as f:
    # Iterates through each line
    for line in f:
        # Rewriting this line, below:
        # line = (' '.join(line.strip().split())).split()

        # Assuming line was "  foo bar   quux  "
        stripped_line = line.strip()     # "foo bar   quux"
        parts = stripped_line.split()    # ["foo", "bar", "quux"]
        joined = ' '.join(parts)         # "foo bar quux"
        parts_again = joined.split()     # ["foo", "bar", "quux"]

Is this what you were looking for?

That code is pointlessly complicated is what it is.

There is no need to strip if you're no-arg split ing next (no-arg split drops leading and trailing whitespace by side-effect), so line.strip().split() can simplify to line.split() .

The join and re- split doesn't change a thing, join sticks the first split back together with spaces, then split resplits on those very same spaces. So you could save the time spent join ing only to split and just keep the original results from the first split , changing it to:

line = line.split()

and it would be functionally identical to the original:

line = (' '.join(line.strip().split())).split()

and faster to boot. I'm guessing the code you were handed was written by someone who didn't understand split ing and join ing either, and just threw stuff at their problem without understanding what it did .

Here is explanation to code:-

with open(filename, "r") as f:
    for line in f:
        line = (' '.join(line.strip().split())).split()

First line.strip() removes leading and trailing white spaces from line and .split() break to list on basis of white spaces.

Again .join convert previous list to a line of white space separated. Finally .split again convert it to list.

This code is superfluous line = (' '.join(line.strip().split())).split() . And it should be:-

line = line.split()

If you again want to strip use:-

line = map(str.strip, line.split())

I think they are doing this to maintain a constant amount of whitespace. The strip is removing all whitespace (could be 5 spaces and a tab), and then they are adding back in the single space in its place.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM