I've complex flat file with huge data of mixed type. Trying to parse it using Python (best known to me), Succeeded to segregate data categorically using manual parsing.
Now stuck at a point where I have extracted data and need to make it tabular so that I could write it into xls, using pandas or any other lib.
I have pasted data at pastebin, url is https://pastebin.com/qn9J5nUL
data comes in non-tabualr and tabular format, out of which I need to discard non-tabular data and only need to write tabular data into xls. To be precise I want to delete below data - ABC Command-----UIP BLOCK:; SE: ABC_UIOP_89TP Report: +ve ABC_UIOP_89TP 2016-09-23 15:16:14 O&M #998459350 %%/*Web=1571835373:;%% ID = 0 Result Ok.
ABC Command-----UIP BLOCK:; SE: ABC_UIOP_89TP Report: +ve ABC_UIOP_89TP 2016-09-23 15:16:14 O&M #998459350 %%/*Web=1571835373:;%% ID = 0 Result Ok.
and only utilize below format data into xls (example, not exact. Please refer to pastebin url to see complete data format) -
Local Info ID ID Name ID Frequency ID Data My ID
0 XXX_1 0 12 13
Since your datafile has certain pattern i think you can do it this way.
import pandas
s = []
e = []
with open('data_to_be_parsed.txt') as f:
datafile = f.readlines()
for idx,line in enumerate(datafile):
if 'Local' in line:
s.append(idx)
if '(Number of results' in line:
e.append(idx)
maindf = pd.DataFrame()
for i in range(len(s)):
head = list(datafile[s[i]].split(" "))
head = [x for x in head if x.strip()]
tmpdf = pd.DataFrame(columns=head)
for l_ in range(s[i]+1,e[i]):
da = datafile[l_]
if len(da)>1:
data = list(da.split(" "))
data = [x for x in data if x.strip()]
tmpdf = tmpdf.append(dict(zip(head,data)),ignore_index=True)
maindf = pd.concat([maindf,tempdf])
maindf.to_excel("output.xlsx")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.