I have a txt file that has a header of metadata followed by the actual data in csv style. The data contains floats with commas. Like this:
title = someTitle
date = 20.0.2019
col= str1 str2 str3
2,49 42,01 -0,50
5,74 11,03 -0,43
....
I need the whole information in pandas (0.24.0) and want the data as floats.
df = pd.read_csv(path,sep='\t',decimal=',',names=[i for i in range(3)])
In this case, the decimal option makes no difference. I always get strings. Without the metadata, it works perfect. eg by:
pd.read_csv(...,skiprows=3)
To me, it seems like pandas assume the type of the rows by the first lines.
So how can tell pandas to ignore the metadata?
read_csv
can read from a file like object, so you should open the file, read 3 rows as headers, extract the column names and optionaly use them in read_csv
. In addition, you can force the datatype with the dtype
option. Code could be:
with open(path) as fd:
headers = [ next(fd) for i in range(3) ]
df = pd.read_csv(fd, sep=' ', decimal=',', dtype=np.float, names=...)
You can use the header part to set the column names if you want:
with open(path) as fd:
headers = [ next(fd) for i in range(3) ]
cols = headers[2].split('=', 1)[1].strip().split(' ')
df = pd.read_csv(fd, sep=' ', decimal=',', dtype=np.float, names=cols)
You would get:
str1 str2 str3
0 2.49 42.01 -0.50
1 5.74 11.03 -0.43
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.