简体   繁体   中英

Python: Read whitespace separated strings from file similar to readline

In Python, f.readline() returns the next line from the file f . That is, it starts at the current position of f , reads till it encounters a line break, returns everything in between and updates the position of f .

Now I want to do the exactly the same, but for whitespace separated files (not only newlines). For example, consider a file f with the content

token1 token2

token3                            token4


         token5

So I'm looking for some function readtoken() such that after opening f , the first call of f.readtoken() returns token1 , the second call retuns token2 etc.

For efficiency and to avoid problems with very long lines or very large files, there should be no buffering.

I was almost sure that this should be possible "out of the box" with the standard library. However, I didn't find any suitable function or a way to redefine the delimiters for readline() .

You'd need to create a wrapper function; this is easy enough:

def read_by_tokens(fileobj):
    for line in fileobj:
        for token in line.split():
            yield token

Note that .readline() doesn't just read a file character by character until a newline is encountered; the file is read in blocks (a buffer) to improve performance.

The above method reads the file by lines but yields the result split on whitespace. Use it like:

with open('somefilename') as f:
    for token in read_by_tokens(f):
        print(token)

Because read_by_tokens() is a generator, you either need to loop directly over the function result, or use the next() function to get tokens one by one:

with open('somefilename') as f:
    tokenized = read_by_tokens(f)

    # read first two tokens separately
    first_token = next(tokenized)
    second_token = next(tokenized)

    for token in tokenized:
        # loops over all tokens *except the first two*
        print(token)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM