I didn't know how to title this question so if it should be renamed please inform me and I will.
I'm reading a csv
file that I saved off from a piece of equipment I used in taking some measurements. The data and various other key information is all being saved off. I've worked all day long on this and I cannot figure out how to retrieve every piece of information from this file properly. I need to access each piece of data/information and after doing so plot the data and read in the other various information like the data/timestamp/timezone/modelnumber/serialnumber and so on... I apologize if this question is too generic but I'm at lost on how to resolve this.
I've typed up several versions of code so I'll list only what I've been able to get to work. I have no idea why I have to use sep=delimiter
. I would think delimiter=',' would work but it doesn't. I've found from research that header=none
because my file doesn't have headings.
Canopy is telling me that the engine 'C' will not work so as a result I specify 'python'. From the output of this code it seems to capture everything. But its telling me I only have one column and have no idea how to separate out all this information.
Here is part of my csv file.
! FILETYPE CSV
! VERSION 1.0 1
! TIMESTAMP Friday 15 April 2016 04:50:05
! TIMEZONE (GMT+08:00) Kuala Lumpur Singapore
! NAME Keysight Technologies
! MODEL N9917A
! SERIAL US52240515
! FIRMWARE_VERSION A.08.05
! CORRECTION
! Application SA
! Trace TIMESTAMP: 2016-04-15 04:50:05Z
! Trace GPS Info...
! GPS Latitude:
! GPS Longitude:
! GPS Seconds Since Last Read: 0
! CHECKSUM 1559060681
! DATA Freq SA Max Hold SA Blank SA Blank SA Blank
! FREQ UNIT Hz
! DATA UNIT dBm
BEGIN
2000000000,-62.6893499803169,0,0,0
2040000000,-64.1528386206532,0,0,0
2080000000,-63.7751897198055,0,0,0
2120000000,-63.663056855945,0,0,0
2160000000,-64.227155790167,0,0,0
2200000000,-63.874804848758,0,0,0
END
Here is my code:
import pandas as pd
df = pd.read_csv('/Users/_XXXXXXXXX_/Desktop/TP1_041416_C.csv',
sep='delimter',
header=None,
engine='python')
UPDATE: - this version will read and parse the header part first, it will generate info
dictionary from the parsed header and will prepare skiprows
list for the pd.read_csv()
function
from collections import defaultdict
import io
import re
import pandas as pd
import pprint as pp
fn = r'D:\temp\.data\36671176.data'
def parse_header(filename):
"""
parses useful (check `header_flt` variable) information from the header
and saves it into `defaultdict`,
generates `skiprow` list for `pd.read_csv()`,
breaks after parsing the header, so the data block will NOT be read
returns: parsed info as defaultdict obj, skiprows list
"""
# useful header information that will be saved into defaultdict
header_flt = r'TIMESTAMP|TIMEZONE|NAME|MODEL|SERIAL|FIRMWARE_VERSION|Trace TIMESTAMP:'
with open(fn) as f:
d = defaultdict(list)
i = 0
skiprows = []
for line in f:
line = line.strip()
if line.startswith('!'):
skiprows.append(i)
# parse: `key` (first RegEx group)
# and `value` (second RegEx group)
m = re.search(r'!\s+(' + header_flt + ')\s+(.*)\s*', line)
if m:
# print(m.group(1), '->', m.group(2))
# save parsed `key` and `value` into defaultdict
d[m.group(1)] = m.group(2)
elif line.startswith('BEGIN'):
skiprows.append(i)
else:
# stop loop if line doesn't start with: '!' or 'BEGIN'
break
i += 1
return d, skiprows
info, skiprows = parse_header(fn)
# parses data block, the header part will be skipped
# NOTE: `comment='E'` - will skip the footer row: "END"
df = pd.read_csv(fn, header=None, usecols=[0,1], names=['freq', 'dBm'],
skiprows=skiprows, skipinitialspace=True, comment='E',
error_bad_lines=False)
print(df)
print('-' * 60)
pp.pprint(info)
OLD version: - reads the whole file in memory and parses it. It might cause problems with big files because it will allocate memory for the whole file content plus the resulting DF:
from collections import defaultdict
import io
import re
import pandas as pd
import pprint as pp
fn = r'D:\temp\.data\36671176.data'
header_pat = r'(TIMESTAMP|TIMEZONE|NAME|MODEL|SERIAL|FIRMWARE_VERSION)\s+([^\r\n]*?)\s*[\r\n]+'
def parse_file(filename):
with open(fn) as f:
txt = f.read()
m = re.search(r'BEGIN\s*[\r\n]+(.*)[\n\r]+END', txt, flags=re.DOTALL|re.MULTILINE)
if m:
data = m.group(1)
df = pd.read_csv(io.StringIO(data), header=None, usecols=[0,1], names=['freq', 'dBm'])
else:
df = pd.DataFrame()
d = defaultdict(list)
for m in re.finditer(header_pat, txt, flags=re.S|re.M):
d[m.group(1)] = m.group(2)
return df, d
df, info = parse_file(fn)
print(df)
pp.pprint(info)
Output:
freq dBm
0 2000000000 -62.689350
1 2040000000 -64.152839
2 2080000000 -63.775190
3 2120000000 -63.663057
4 2160000000 -64.227156
5 2200000000 -63.874805
defaultdict(<class 'list'>,
{'FIRMWARE_VERSION': 'A.08.05',
'MODEL': 'N9917A',
'NAME': 'Keysight Technologies',
'SERIAL': 'US52240515',
'TIMESTAMP': 'Friday 15 April 2016 04:50:05',
'TIMEZONE': '(GMT+08:00) Kuala Lumpur Singapore'})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.