How to deal with metadata lines in pandas.read_csv?

Question

I have a txt file that has a header of metadata followed by the actual data in csv style. The data contains floats with commas. Like this:

title = someTitle
date = 20.0.2019
col= str1 str2 str3
2,49 42,01 -0,50
5,74 11,03 -0,43
....

I need the whole information in pandas (0.24.0) and want the data as floats.

df = pd.read_csv(path,sep='\t',decimal=',',names=[i for i in range(3)])

In this case, the decimal option makes no difference. I always get strings. Without the metadata, it works perfect. eg by:

pd.read_csv(...,skiprows=3)

To me, it seems like pandas assume the type of the rows by the first lines.

So how can tell pandas to ignore the metadata?

Answer 1

read_csv can read from a file like object, so you should open the file, read 3 rows as headers, extract the column names and optionaly use them in read_csv . In addition, you can force the datatype with the dtype option. Code could be:

with open(path) as fd:
    headers = [ next(fd) for i in range(3) ]
    df = pd.read_csv(fd, sep=' ', decimal=',', dtype=np.float, names=...)

You can use the header part to set the column names if you want:

with open(path) as fd:
    headers = [ next(fd) for i in range(3) ]
    cols = headers[2].split('=', 1)[1].strip().split(' ')
    df = pd.read_csv(fd, sep=' ', decimal=',', dtype=np.float, names=cols)

You would get:

   str1   str2  str3
0  2.49  42.01 -0.50
1  5.74  11.03 -0.43

How to deal with metadata lines in pandas.read_csv?

Question

1 answers

solution1
0 ACCPTED 2019-02-01 15:50:17

How to deal with metadata lines in pandas.read_csv?

Question

1 answers

solution1 0 ACCPTED 2019-02-01 15:50:17

solution1
0 ACCPTED 2019-02-01 15:50:17