Elegant way to process and skip first line(s) (containing metadata) when reading an array from file in Python (possibly pandas)?

Question

Suppose I have a file like this bla.txt :

    21        27       268       288       
55.1   21.2   25.5   23.5   22.3   20.8
28.3   27.1   27.2   26.   25.   23.1 
29.8   28.3   29.0   28.6   27.2   24.4

The first line contains metadata which I would like to use later in my script ( a, b, c, d in my script below), then the rest is a plain array which is easy to read. Is there a way to process this first line while skipping it at the reading time of the array?

In other words, how to do the following in a more elegant/pythonic way?

import numpy as np

fname = 'bla.txt'

with open(fname) as f:
    lines = f.readlines()
    a, b, c, d = [float(x) for x in lines[0].split()]

myarray = np.loadtxt(fname, skiprows=1)

EDIT : Solution with pandas are welcome. [Note that ideally, a solution able to process and skip more than one metadata line would be perfect]

Answer 1

You can tell numpy.loadtxt to skip rows.

>>> import numpy as np
>>> np.loadtxt('bla.txt', skiprows=1)
array([[55.1, 21.2, 25.5, 23.5, 22.3, 20.8],
       [28.3, 27.1, 27.2, 26. , 25. , 23.1],
       [29.8, 28.3, 29. , 28.6, 27.2, 24.4]])

You can get the first line of any file without numpy with

>>> with open('bla.txt') as f:
...     line1 = next(f)
... 
>>> line1
'    21        27       268       288       \n'

If your header line wasn't missing values you could also just read in the whole file with loadtxt and then slice the array into a data and a header part.

You did not tag it, but I recommend using pandas for convenience in your specific case.

>>> import pandas as pd
>>> df = pd.read_csv('bla.txt', delim_whitespace=True)
>>> line1 = list(df.columns)
>>> data = df.reset_index().values
>>> 
>>> line1
['21', '27', '268', '288']
>>> data
array([[55.1, 21.2, 25.5, 23.5, 22.3, 20.8],
       [28.3, 27.1, 27.2, 26. , 25. , 23.1],
       [29.8, 28.3, 29. , 28.6, 27.2, 24.4]])

Answer 2

Files are iterators over their lines, so you can just use next(f) to get the first line AND move the iterator pointer to the second one. Then (in the with block) you can pass the opened file f to numpy.loadtxt() , so it only starts reading from the second line:

import numpy as np

fname = 'bla.txt'

with open(fname) as f:
    first = next(f)
    a, b, c, d = [float(x) for x in first.split()]
    myarray = np.loadtxt(f) # no skiprows here

Answer 3

只做lines = f.readlines()[1:]而不是lines = f.readlines() 。

Elegant way to process and skip first line(s) (containing metadata) when reading an array from file in Python (possibly pandas)?

Question

3 answers

solution1
2 2018-09-04 10:22:48

solution2
1 ACCPTED 2018-09-04 10:35:40

solution3
0 2018-09-04 10:22:43

Elegant way to process and skip first line(s) (containing metadata) when reading an array from file in Python (possibly pandas)?

Question

3 answers

solution1 2 2018-09-04 10:22:48

solution2 1 ACCPTED 2018-09-04 10:35:40

solution3 0 2018-09-04 10:22:43

solution1
2 2018-09-04 10:22:48

solution2
1 ACCPTED 2018-09-04 10:35:40

solution3
0 2018-09-04 10:22:43