with open(filename, "r") as f:
for line in f:
line = (' '.join(line.strip().split())).split()
Can anyone break down the line where whitespaces get removed? I understand line.strip().split() first removes leading and trailing spaces from line
then the resulting string gets split on whitespaces and stores all words in a list.
But what does the remaining code do?
The line ' '.join(line.strip().split())
creates a string
consisting of all the list elements separated by exactly one whitespace character. Applying split()
method on this string again returns a list containing all the words in the string which were separated by a whitespace character.
Here's a breakdown:
# Opens the file
with open(filename, "r") as f:
# Iterates through each line
for line in f:
# Rewriting this line, below:
# line = (' '.join(line.strip().split())).split()
# Assuming line was " foo bar quux "
stripped_line = line.strip() # "foo bar quux"
parts = stripped_line.split() # ["foo", "bar", "quux"]
joined = ' '.join(parts) # "foo bar quux"
parts_again = joined.split() # ["foo", "bar", "quux"]
Is this what you were looking for?
That code is pointlessly complicated is what it is.
There is no need to strip
if you're no-arg split
ing next (no-arg split
drops leading and trailing whitespace by side-effect), so line.strip().split()
can simplify to line.split()
.
The join
and re- split
doesn't change a thing, join
sticks the first split
back together with spaces, then split
resplits on those very same spaces. So you could save the time spent join
ing only to split
and just keep the original results from the first split
, changing it to:
line = line.split()
and it would be functionally identical to the original:
line = (' '.join(line.strip().split())).split()
and faster to boot. I'm guessing the code you were handed was written by someone who didn't understand split
ing and join
ing either, and just threw stuff at their problem without understanding what it did .
Here is explanation to code:-
with open(filename, "r") as f:
for line in f:
line = (' '.join(line.strip().split())).split()
First line.strip()
removes leading and trailing white spaces from line and .split()
break to list on basis of white spaces.
Again .join
convert previous list to a line of white space separated. Finally .split
again convert it to list.
This code is superfluous line = (' '.join(line.strip().split())).split()
. And it should be:-
line = line.split()
If you again want to strip
use:-
line = map(str.strip, line.split())
I think they are doing this to maintain a constant amount of whitespace. The strip is removing all whitespace (could be 5 spaces and a tab), and then they are adding back in the single space in its place.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.