I have a raw data file in the below format with multiple rows:
NAME: Jack Age : 25 skill : c++ designation : Analyst other comments:this
is basic info
NAME : Kattie Age: 45 skill: python designation: director Other Comments: name : Jane Kattie
I want output as :
name age skill designation other_Comments name_2
0 Jack 25 c++ analyst This is basic Info NA
1 Kattie 45 python Director NA Jane Kattie
I have tried using below codes but unable to handle special cases like row 2, i am new to python, please suggest if there is any better way , the key words are definite set of values, but may repeat more than once.
Codes:
file =pd.read_excel('mydata.xlsx', sheetname="Sheet1", header=None)
file.columns =['data']
for i in range(0,len(file)):
x=file[file.columns.values [0]][i]
name= re.findall(r'Name:(.*?)Age',x)
Age= re.findall(r'Age(.*?) skill',x)
skills= re.findall(r'skill(.*?)designation',x)
other_Comments = re.findall(r'other comments(.*?),x)
file['Name'][i] = name
file['Age'][i] = Age
file['Skill'][i] = skills
file ['Other_Comments'][i] = other_Comments
Python has a separate module for handling csv files:
import csv
For more information about how to use it, I recommend going to the python.org website. There you'll find all you need about how to use it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.