Pattern to split a column based on regex

Question

I have a data frame where each row represent a full name and a website. I need to split that into 2 columns: name and website.

I've tried to use pandas str.split but I'm struggling to create a regex pattern that catches any initial 'http' plus the rest of the website. I have websites starting with http and https.

df = pd.DataFrame([['John Smith http://website.com'],['Alan Delon https://alandelon.com']])

I want to have a pattern that correctly identify the website to split my data. Any help would be very much appreciated.

Answer 1

using str.split

 pd.DataFrame(df[0].str.split('\s(?=http)').tolist()).rename({0:'Name',1:'Website'}, axis=1)

Output

         Name                Website
0  John Smith  http://website.com   
1  Alan Delon  https://alandelon.com

Answer 2

Using str.extract

Ex:

df = pd.DataFrame([['John Smith http://website.com'],['Alan Delon https://alandelon.com']], columns=["data"])
df[["Name", "Url"]] = df["data"].str.extract(r"(.*?)(http.*)")
print(df)

Output:

                               data         Name                    Url
0     John Smith http://website.com  John Smith      http://website.com
1  Alan Delon https://alandelon.com  Alan Delon   https://alandelon.com

Pattern to split a column based on regex

Question

2 answers

solution1
0 ACCPTED 2019-10-10 08:56:51

solution2
0 2019-10-10 08:57:07

Pattern to split a column based on regex

Question

2 answers

solution1 0 ACCPTED 2019-10-10 08:56:51

solution2 0 2019-10-10 08:57:07

solution1
0 ACCPTED 2019-10-10 08:56:51

solution2
0 2019-10-10 08:57:07