简体   繁体   中英

Elegant way to process and skip first line(s) (containing metadata) when reading an array from file in Python (possibly pandas)?

Suppose I have a file like this bla.txt :

    21        27       268       288       
55.1   21.2   25.5   23.5   22.3   20.8
28.3   27.1   27.2   26.   25.   23.1 
29.8   28.3   29.0   28.6   27.2   24.4 

The first line contains metadata which I would like to use later in my script ( a, b, c, d in my script below), then the rest is a plain array which is easy to read. Is there a way to process this first line while skipping it at the reading time of the array?

In other words, how to do the following in a more elegant/pythonic way?

import numpy as np

fname = 'bla.txt'

with open(fname) as f:
    lines = f.readlines()
    a, b, c, d = [float(x) for x in lines[0].split()]

myarray = np.loadtxt(fname, skiprows=1)

EDIT : Solution with pandas are welcome. [Note that ideally, a solution able to process and skip more than one metadata line would be perfect]

You can tell numpy.loadtxt to skip rows.

>>> import numpy as np
>>> np.loadtxt('bla.txt', skiprows=1)
array([[55.1, 21.2, 25.5, 23.5, 22.3, 20.8],
       [28.3, 27.1, 27.2, 26. , 25. , 23.1],
       [29.8, 28.3, 29. , 28.6, 27.2, 24.4]])

You can get the first line of any file without numpy with

>>> with open('bla.txt') as f:
...     line1 = next(f)
... 
>>> line1
'    21        27       268       288       \n'

If your header line wasn't missing values you could also just read in the whole file with loadtxt and then slice the array into a data and a header part.

You did not tag it, but I recommend using pandas for convenience in your specific case.

>>> import pandas as pd
>>> df = pd.read_csv('bla.txt', delim_whitespace=True)
>>> line1 = list(df.columns)
>>> data = df.reset_index().values
>>> 
>>> line1
['21', '27', '268', '288']
>>> data
array([[55.1, 21.2, 25.5, 23.5, 22.3, 20.8],
       [28.3, 27.1, 27.2, 26. , 25. , 23.1],
       [29.8, 28.3, 29. , 28.6, 27.2, 24.4]])

Files are iterators over their lines, so you can just use next(f) to get the first line AND move the iterator pointer to the second one. Then (in the with block) you can pass the opened file f to numpy.loadtxt() , so it only starts reading from the second line:

import numpy as np

fname = 'bla.txt'

with open(fname) as f:
    first = next(f)
    a, b, c, d = [float(x) for x in first.split()]
    myarray = np.loadtxt(f) # no skiprows here

只做lines = f.readlines()[1:]而不是lines = f.readlines()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM