I have a huge raw dataset( 4k lines each text file) with a lot of vertical bars and spaces.
|group call| pvt call |phone call|group busy| pvt busy |phone busy|
time |total |total |total |total |total |total | %
period| sec cnt | sec cnt| sec cnt| sec cnt| sec cnt| sec cnt | usage
00:00 | 4323 548| 0 0| 0 0| 0 0| 0 0| 0 0| 18%
00:15 | 4125 479| 0 0| 0 0| 0 0| 0 0| 0 0| 17%
00:30 | 3071 395| 0 0| 0 0| 0 0| 0 0| 0 0| 13%
00:45 | 3514 447| 0 0| 0 0| 0 0| 0 0| 0 0| 14%
01:00 | 3081 383| 0 0| 0 0| 0 0| 0 0| 0 0| 13%
I want to convert it into a csv file. The parser that I built using python and pandas only reads csv values. How can I do so? The csv file should look something like:
time_pd,group_call_t_s,group_call_t_c,pvt_call_t_sec,pvt_call_t_c,phone_call_t_sec,phone_call_t_c,group_busy_t_sec,group_busy_t_c,pvt_busy_t_sec, pvt_busy_t_c,phone_busy_t_sec, phone_busy_t_c, per_usage
00:00,4323,548,0,0,0,0,0,0,0,0,0,0,18%
00:15,4125,479,0,0,0,0,0,0,0,0,0,0,17%
00:30,3071,395,0,0,0,0,0,0,0,0,0,0,13%
00:45,3514,447,0,0,0,0,0,0,0,0,0,0,14%
01:00,3081,383,0,0,0,0,0,0,0,0,0,0,13%
01:15,4017,470,0,0,0,0,0,0,0,0,0,0,18%
01:30,4767,555,0,0,0,0,0,0,0,0,0,0,18%
Python
If all files have the same header structure, you can read the data part, assign the headers, and then save to CSV:
data = pd.read_csv("file1.txt", sep=r'\s*\|?\s*', header=None, skiprows=3)
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13
#0 00:00 4323 548 0 0 0 0 0 0 0 0 0 0 18%
#1 00:15 4125 479 0 0 0 0 0 0 0 0 0 0 17%
#2 00:30 3071 395 0 0 0 0 0 0 0 0 0 0 13%
#3 00:45 3514 447 0 0 0 0 0 0 0 0 0 0 14%
#4 01:00 3081 383 0 0 0 0 0 0 0 0 0 0 13%
data.columns = "time_pd","group_call_t_s","group_call_t_c",...
data.to_csv("file1.csv", index=None)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.