简体   繁体   中英

Reading in header information from csv file using Pandas

I have a data file that has 14 lines of header. In the header, there is the metadata for the latitude-longitude coordinates and time. I am currently using

pandas.read_csv(filename, delimiter",", header=14)

to read in the file but this just gets the data and I can't seem to get the metadata. Would anyone know how to read in the information in the header? The header looks like:

CSD,20160315SSIO
NUMBER_HEADERS = 11
EXPOCODE = 33RR20160208
SECT_ID = I08
STNBBR = 1
CASTNO = 1
DATE = 20160219
TIME = 0558
LATITUDE = -66.6027
LONGITUDE = 78.3815
DEPTH = 462
INSTRUMENT_ID = 0401
CTDPRS,CTDPRS_FLAG,CTDTMP,CTDTMP_FLAG
DBAR,,ITS-90,,PSS-78

You have to parse your metadata header by yourself, yet you can do it in an elegant manner in one pass and even by using it on the fly so that you can extract data out it / control the correctness of the file etc.

First, open the file yourself:

f = open(filename)

Then, do the work to parse each metadata line to extract data out it. For the sake of the explanation, I'm just skipping these rows:

for i in range(13):  # skip the first 13 lines that are useless for the columns definition
    f.readline()  # use the resulting string for metadata extraction

Now you have the file pointer ready on the unique header line you want to use to load the DataFrame. The cool thing is that read_csv accepts file objects! Thus you start loading your DataFrame right away now:

pandas.read_csv(f, sep=",") 

Note that I don't use the header argument as I consider by your description you have only that one last line of header that is useful for your dataframe. You can build and adjust hearder parsing values / rows to skip from that example.

Although the following method does not use Pandas, I was able to extract the header information.

with open(fname) as csvfile:
    forheader_IO2016 = csv.reader(csvfile, delimiter=',')
    header_IO2016 = []
    for row in forheader_IO2016:
        header_IO2016.append(row[0])

date = header_IO2016[7].split(" ")[2]
time = header_IO2016[8].split(" ")[2]
lat = float(header_IO2016[9].split(" ")[2])
lon = float(header_IO2016[10].split(" ")[4])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM