简体   繁体   中英

reformatting a sequential data file into a data frame using pandas

I have an input file, now converted to a pandas.dataframe . The records/rows are in a sequence which contain related data of the form

    survey, a, b, c
    section, 1, 2, 3
    observation, a, b, c
    values, 1, 2, 3 
    values, 4, 5, 6
    observation, d, e, f
    values, 7, 8, 9
    section, 4, 5, 6
    ...

The survey record only occurs once. A section may occur multiple times and will contain observation and value records. Observations will always be followed by values sometimes as multiple records.

I am trying to reformat this into rows where each set of values is on a separate row with its corresponding survey, section, and observation.

survey, a,b,c, section, 1,2,3, observation, a,b,c, values, 1,2,3
survey, a,b,c, section, 1,2,3, observation, a,b,c, values, 4,5,6
survey, a,b,c, section, 1,2,3, observation, d, e, f, values, 7, 8, 9
survey, a,b,c, section, 4, 5, 6 and so on....

Can this be done with pandas or should I iterate through an if, then else structure ?

The methods I've tried so far are the following (these are probably simplistic and heading in the wrong directions):

#pd.DataFrame(hmdDataToProcess.unstack())

#hmdDataToProcess.unstack

#hmdDataToProcess.stack

#pd.melt(hmdDataToProcess, id_vars =[0], value_vars = 
['SURVEY','SECTION','OBSERV','OBVAL'])

#df2 = hmdDataToProc0ess.pivot_table(index = [0]).reset_index()

#df2 = df_in.pivot_table(index = 
#['Example1','Example2'],columns='VC', values=
#['Weight','Rank']).reset_index()

#hmdDataToProcess.groupby('SECTION').groups #, 'OBSERV', 'OBVAL'

You could do it without using Pandas

s = '''survey, a, b, c
     section, 1, 2, 3
     observation, a, b, c
     values, 1, 2, 3 
     values, 4, 5, 6
     observation, d, e, f
     values, 7, 8, 9
     section, 4, 5, 6'''
list_s = s.strip().split('\n')
list_s = [x.strip() for x in list_s]
list_s
# ['survey, a, b, c', 'section, 1, 2, 3', 'observation, a, b, c', 'values, 1, 2, 3', 'values, 4, 5, 6', 'observation, d, e, f', 'values, 7, 8, 9', 'section, 4, 5, 6']
for el in list_s:
   if el.split(',')[0] == 'survey':
     survey = el
   if el.split(',')[0] == 'section':
     section = el
   if el.split(',')[0] == 'observation':
     observation = el
   if el.split(',')[0] == 'values':
     print(f"{survey},{section},{observation},{el}")
#survey, a, b, c,section, 1, 2, 3,observation, a, b, c,values, 1, 2, 3
#survey, a, b, c,section, 1, 2, 3,observation, a, b, c,values, 4, 5, 6
#survey, a, b, c,section, 1, 2, 3,observation, d, e, f,values, 7, 8, 9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM