I have dataframe with description column, under one row of description there are multiple lines of texts, basically those are set of information for each record.
Example: Regarding information no 1 at 07-01-2019 we got update as the sky is blue and at 05-22-2019 we again got update as Apples are red, that are arranged between two dates. Firstly, I would like to extract text between the date and split the respective details in new columns as date, name, description.
The raw description looks like
info no| Description
--------------------------------------------------------------------------
1 |07-01-2019 12:59:41 - XYZ (Work notes) The sky is blue in color.
| Clouds are looking lovely.
| 05-22-2019 12:00:49 - MNX (Work notes) Apples are red in color.
--------------------------------------------------------------------------
| 02-26-2019 12:53:18 - ABC (Work notes) Task is to separate balls.
2 | 02-25-2019 16:57:57 - lMN (Work notes) He came by train.
| That train was 15 min late.
| He missed the concert.
| 02-25-2019 11:08:01 - sbc (Work notes) She is my grandmother.
Desired output is
info No |DATE | NAME | DESCRIPTION
--------|------------------------------------------------------
1 |07-01-2019 12:59:41 | xyz | The sky is blue in color.
| | | Clouds are looking lovely.
--------|---------------------------------------------------------
1 |05-22-2019 12:00:49 | MNX | Apples are red in color
--------|---------------------------------------------------------
2 | 02-26-2019 12:53:18 | ABC | Task is to separate blue balls.
--------|---------------------------------------------------------
2 | 02-25-2019 16:57:57 | IMN | He came by train
| | | That train was 15 min late.
| | | He missed the concert.
--------|---------------------------------------------------------
| 02-25-2019 11:08:01 | sbc | She is my grandmother.
I tried:
myDf = pd.DataFrame(re.split('(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2} -.*)',Description),columns = ['date'])
myDf['date'] = myDf['date'].replace('(Work notes)','-', regex=True)
newQueue = newQueue.date.str.split(-,n=3)
Having this dataframe
df
Description
Sl No
1 07-01-2019 12:59:41 - XYZ (Work notes) The sky...
2 05-22-2019 12:00:49 - MNX (Work notes) Apples...
3 02-26-2019 12:53:18 - ABC (Work notes) Task is...
4 02-25-2019 16:57:57 - lMN (Work notes) He came...
5 02-25-2019 11:08:01 - sbc (Work notes) She is ...
you can split the strings at the description column by "(Work notes)" and then you can use values.tolist to split it into 2 columns as follows:
x['Description']=x['Description'].apply(lambda x: x.split('(Work notes)'))
x=pd.DataFrame(x['Description'].values.tolist(), index= x.index)
print(x)
0 1
Sl No
1 07-01-2019 12:59:41 - XYZ The sky is blue in color.
2 05-22-2019 12:00:49 - MNX Apples are red in color.
3 02-26-2019 12:53:18 - ABC Task is to separate balls.
4 02-25-2019 16:57:57 - lMN He came by train.
5 02-25-2019 11:08:01 - sbc She is my grandmother.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.