基于正则表达式拆分列的模式

Question

I have a data frame where each row represent a full name and a website.我有一个数据框，其中每一行代表一个全名和一个网站。 I need to split that into 2 columns: name and website.我需要将其分成两列：名称和网站。

I've tried to use pandas str.split but I'm struggling to create a regex pattern that catches any initial 'http' plus the rest of the website.我尝试使用 pandas str.split 但我正在努力创建一个正则表达式模式来捕获任何初始的“http”加上网站的 rest。 I have websites starting with http and https.我的网站以 http 和 https 开头。

df = pd.DataFrame([['John Smith http://website.com'],['Alan Delon https://alandelon.com']])

I want to have a pattern that correctly identify the website to split my data.我想要一个能够正确识别网站的模式来拆分我的数据。 Any help would be very much appreciated.任何帮助将不胜感激。

Answer 1

using str.split使用str.split

 pd.DataFrame(df[0].str.split('\s(?=http)').tolist()).rename({0:'Name',1:'Website'}, axis=1)

Output Output

         Name                Website
0  John Smith  http://website.com   
1  Alan Delon  https://alandelon.com

Answer 2

Using str.extract使用str.extract

Ex:前任：

df = pd.DataFrame([['John Smith http://website.com'],['Alan Delon https://alandelon.com']], columns=["data"])
df[["Name", "Url"]] = df["data"].str.extract(r"(.*?)(http.*)")
print(df)

Output: Output：

                               data         Name                    Url
0     John Smith http://website.com  John Smith      http://website.com
1  Alan Delon https://alandelon.com  Alan Delon   https://alandelon.com

基于正则表达式拆分列的模式

问题描述

2 个解决方案

解决方案1
0 已采纳 2019-10-10 08:56:51

解决方案2
0 2019-10-10 08:57:07

基于正则表达式拆分列的模式

问题描述

2 个解决方案

解决方案1 0 已采纳 2019-10-10 08:56:51

解决方案2 0 2019-10-10 08:57:07

解决方案1
0 已采纳 2019-10-10 08:56:51

解决方案2
0 2019-10-10 08:57:07