reformatting a sequential data file into a data frame using pandas

Question

I have an input file, now converted to a pandas.dataframe . The records/rows are in a sequence which contain related data of the form

    survey, a, b, c
    section, 1, 2, 3
    observation, a, b, c
    values, 1, 2, 3 
    values, 4, 5, 6
    observation, d, e, f
    values, 7, 8, 9
    section, 4, 5, 6
    ...

The survey record only occurs once. A section may occur multiple times and will contain observation and value records. Observations will always be followed by values sometimes as multiple records.

I am trying to reformat this into rows where each set of values is on a separate row with its corresponding survey, section, and observation.

survey, a,b,c, section, 1,2,3, observation, a,b,c, values, 1,2,3
survey, a,b,c, section, 1,2,3, observation, a,b,c, values, 4,5,6
survey, a,b,c, section, 1,2,3, observation, d, e, f, values, 7, 8, 9
survey, a,b,c, section, 4, 5, 6 and so on....

Can this be done with pandas or should I iterate through an if, then else structure ?

The methods I've tried so far are the following (these are probably simplistic and heading in the wrong directions):

#pd.DataFrame(hmdDataToProcess.unstack())

#hmdDataToProcess.unstack

#hmdDataToProcess.stack

#pd.melt(hmdDataToProcess, id_vars =[0], value_vars = 
['SURVEY','SECTION','OBSERV','OBVAL'])

#df2 = hmdDataToProc0ess.pivot_table(index = [0]).reset_index()

#df2 = df_in.pivot_table(index = 
#['Example1','Example2'],columns='VC', values=
#['Weight','Rank']).reset_index()

#hmdDataToProcess.groupby('SECTION').groups #, 'OBSERV', 'OBVAL'

Answer 1

You could do it without using Pandas

s = '''survey, a, b, c
     section, 1, 2, 3
     observation, a, b, c
     values, 1, 2, 3 
     values, 4, 5, 6
     observation, d, e, f
     values, 7, 8, 9
     section, 4, 5, 6'''
list_s = s.strip().split('\n')
list_s = [x.strip() for x in list_s]
list_s
# ['survey, a, b, c', 'section, 1, 2, 3', 'observation, a, b, c', 'values, 1, 2, 3', 'values, 4, 5, 6', 'observation, d, e, f', 'values, 7, 8, 9', 'section, 4, 5, 6']
for el in list_s:
   if el.split(',')[0] == 'survey':
     survey = el
   if el.split(',')[0] == 'section':
     section = el
   if el.split(',')[0] == 'observation':
     observation = el
   if el.split(',')[0] == 'values':
     print(f"{survey},{section},{observation},{el}")
#survey, a, b, c,section, 1, 2, 3,observation, a, b, c,values, 1, 2, 3
#survey, a, b, c,section, 1, 2, 3,observation, a, b, c,values, 4, 5, 6
#survey, a, b, c,section, 1, 2, 3,observation, d, e, f,values, 7, 8, 9

reformatting a sequential data file into a data frame using pandas

Question

1 answers

solution1
0 ACCPTED 2019-04-16 12:14:02

reformatting a sequential data file into a data frame using pandas

Question

1 answers

solution1 0 ACCPTED 2019-04-16 12:14:02

solution1
0 ACCPTED 2019-04-16 12:14:02