简体   繁体   中英

Python read in string from file and split it into column names and values

I have a raw data file in the below format with multiple rows:

NAME: Jack Age : 25   skill : c++ designation : Analyst other comments:this 
is basic info

NAME : Kattie Age: 45 skill: python  designation: director Other Comments: name : Jane Kattie 

I want output as :

    name    age skill   designation  other_Comments      name_2 
0   Jack    25  c++     analyst      This is basic Info  NA
1   Kattie  45  python  Director      NA                 Jane Kattie

I have tried using below codes but unable to handle special cases like row 2, i am new to python, please suggest if there is any better way , the key words are definite set of values, but may repeat more than once.

Codes:

file =pd.read_excel('mydata.xlsx', sheetname="Sheet1", header=None)
file.columns =['data']

for i in range(0,len(file)):
     x=file[file.columns.values [0]][i]  
     name= re.findall(r'Name:(.*?)Age',x)
     Age= re.findall(r'Age(.*?) skill',x)
     skills= re.findall(r'skill(.*?)designation',x)
     other_Comments = re.findall(r'other comments(.*?),x)
     file['Name'][i] = name
     file['Age'][i] = Age
     file['Skill'][i] = skills
     file ['Other_Comments'][i] = other_Comments

Python has a separate module for handling csv files:

import csv

For more information about how to use it, I recommend going to the python.org website. There you'll find all you need about how to use it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM