简体   繁体   中英

I would like to split text in a column for a pandas dataframe based on multiple Delimiters and create new rows for each

Hello everyone below is my dataframe:

Below is my dataset:

Name      University            Subject              

John      Harvard               English, French
John      MIT                   Economics 
Alan      BU                    Data Science & Math

I would like to have the following output:

Name      University            Subject              

John      Harvard               English
John      Harvard               French
John      MIT                   Economics 
Alan      BU                    Data Science
Alan      BU                    Math

I have tried the code below:

df.drop('subject', axis=1).join(df['subject'].str.split(',', expand=True).stack().reset_index(level=1,drop=True).rename('subject'))

This works but only splits it according to ',' but I would also like to split it for '&'.

Please help me, I am generally new to python and am open to using all libraries like Pandas and NumPy.

I found the above solution on another Stackoverflow question, however, I do not fully understand the steps. Please explain the steps as clearly as possible.

Thanks :)

You can use a regex expression in place of just ',' to include additional characters to split on. For example:

import pandas as pd

df = pd.DataFrame({'Name':['John', 'John', 'Alan', 'Joe'], 
'University':['Harvard', 'MIT', 'BU', 'NYU'], 
'Subject':['English, French', 'Economics', 'Data Science & Math', 
'Economics and French']})

df = df.drop('Subject', axis=1).join(df['Subject'].str.split(',|&|and', expand=True).stack().reset_index(level=1,drop=True).rename('Subject'))

# remove extra white space
df['Subject'] = df['Subject'].str.strip()
df
   Name University       Subject
0  John    Harvard       English
0  John    Harvard        French
1  John        MIT     Economics
2  Alan         BU  Data Science
2  Alan         BU          Math
3   Joe        NYU     Economics
3   Joe        NYU        French

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM