Add column name to a DataFrame in for loop in pandas

Question

My dataset has no header, so no column name. The dataset begins with the information from the first line. I'd like to add column names.

Edit add DataSet:

30/10/2016 17:18:51 [13] 10-Full: L 1490; A 31; F 31; S 31; DL 0; SL 0; DT 5678
30/10/2016 17:18:51 [13] 00-Always: Returning 31 matches
30/10/2016 17:18:51 [13] 30-Normal: Query complete
30/10/2016 17:18:51 [13] 30-Normal: Request completed in 120 ms.
30/10/2016 17:19:12 [15] 00-Always: Request from 120.0.0.1
30/10/2016 17:19:12 [15] 00-Always: action=Query&Text=(("XXXXXX":*/DOCUMENT/DRECONTENT/ObjectInfo/type+OR+"XXXXXX":*/DOCUMENT/.....
30/10/2016 17:19:12 [15] 10-Full: L 2; A 1; F 1; S 0; DL 0; SL 0; DT 5373
30/10/2016 17:19:12 [15] 00-Always: Returning 0 matches
30/10/2016 17:19:12 [15] 30-Normal: Query complete
30/10/2016 17:19:12 [15] 30-Normal: Request completed in 93 ms.
30/10/2016 17:19:20 [17] 00-Always: Request from 120.0.0.1
30/10/2016 17:19:20 [17] 00-Always: action=Query&Text=((PDF:*/DOCUMENT/DRECONTENT/XXXXX/type+AND+XXXXXX.......
30/10/2016 17:19:51 [19] 10-Full: L 255; A 0; F 0; S 0; DL 0; SL 0; DT 5021
30/10/2016 17:19:51 [19] 00-Always: Returning 0 matches
30/10/2016 17:19:51 [19] 30-Normal: Query complete
30/10/2016 17:19:51 [19] 30-Normal: Request completed in 29 ms.
30/10/2016 17:20:44 [27] 00-Always: Request from 120.0.0.1
30/10/2016 17:20:44 [27] 00-Always: action=Query&Tex(Image:*/DOCUMENT/DRECONTENT/ObjectInfo/type+AND+(
30/10/2016 17:20:44 [27] 10-Full: L 13; A 0; F 0; S 0; DL 0; SL 0; DT 5235
30/10/2016 17:20:44 [27] 00-Always: Returning 0 matches
30/10/2016 17:20:44 [27] 30-Normal: Query complete
30/10/2016 17:20:44 [27] 30-Normal: Request completed in 27 ms.
30/10/2016 17:21:09 [25] 00-Always: Request from 120.0.0.1
30/10/2016 17:21:09 [25] 00-Always: action=Query&Text=XXXXXX:*/DOCUMENT/DRECONTENT/ObjectIn

My Code:

for df in pd.read_csv('data.csv', sep='\s',  header=None, chunksize=6):
df.reset_index(drop=True, inplace=True)
df.fillna('', inplace=True)
d = pd.DataFrame([df.loc[3,0], df.loc[3,1], ' '.join(df.loc[3,4:8]), ' '.join(df.loc[4,4:6]), ' '.join(df.loc[5,4:])])
d.T.to_csv('out.log', index=False, header=False, mode='a', sep=';')

Output from "My Code":

30/10/2016;17:19:12;Request completed in 93 ms.;Request from 120.0.0.1;action=Query&Text=((PDF:*/DOCUMENT/DRECONTENT/XXXXX....
30/10/2016;17:18:51;Request completed in 120 ms.;Request from 120.0.0.1;action=Query&Text=(("EOM.CompoundStory":*/DOCUMENT/DRECONTE....
30/10/2016;17:19:51;Request completed in 29 ms.;Request from 120.0.0.1;action=Query&Text=(Image:*/DOCUMENT/DRECONTENT/ObjectInfo/type+AND+((.....
30/10/2016;17:20:44;Request completed in 27 ms.;Request from 120.0.0.1;action=Query&Text=XXXXX:*/DOCUMENT/DRECONT....

Now I want to add in the first row a header like 1;2;3;4;5

My approach:

d.T.to_csv('out2.csv', index=False, header=['1', '2', '3', '4', '5'], mode='a', sep=';')

My Output:

1;2;3;4;5
07.11.2016;13:40:45;Request completed in 44 ms.;Request from 1.1.106 action=Query&Text=
1;2;3;4;5
07.11.2016;13:41:00;Request;completed in 37 ms.;Request from 1.1.106 ;action=Query&Text=   
1;2;3;4;5
07.11.2016;13:41:00;Request;completed in 32 ms.;Request from 1.1.106 ;action=Query&Text=

My excepted Output:

1;2;3;4;5
07.11.2016;13:40:45;Request completed in 44 ms.;Request from 1.1.106 action=Query&Text=
07.11.2016;13:41:00;Request;completed in 37 ms.;Request from 1.1.106 ;action=Query&Text=   
07.11.2016;13:41:00;Request;completed in 32 ms.;Request from 1.1.106 ;action=Query&Text=

Answer 1

You can try create empty df with header, write to out.log and then append data with no header :

cols = ['1', '2', '3', '4', '5']
pd.DataFrame(columns=cols).to_csv('out.log', index=False, sep=';')

for df in pd.read_csv('data.csv', sep='\s+',  header=None, chunksize=6):
    df.reset_index(drop=True, inplace=True)
    df.fillna('', inplace=True)
    d = pd.DataFrame([df.loc[3,0], 
                      df.loc[3,1], 
                      ' '.join(df.loc[3,4:8]), 
                      ' '.join(df.loc[4,4:6]), 
                      ' '.join(df.loc[5,4:])])
    d.T.to_csv('out.log', index=False, header=False, mode='a', sep=';')

Add column name to a DataFrame in for loop in pandas

Question

1 answers

solution1
2 ACCPTED 2016-12-09 09:40:58

Add column name to a DataFrame in for loop in pandas

Question

1 answers

solution1 2 ACCPTED 2016-12-09 09:40:58

solution1
2 ACCPTED 2016-12-09 09:40:58