[英]Extracting particular characters/ text from DataFrame column
我試圖從Dataframe的郵件列中獲取電子郵件提供程序,並創建一個名為“Mail_Provider”的新列。 例如,從a@gmail.com獲取gmail並將其存儲在“Mail_Provider”列中。 另外,我想從Phone列中提取Country ISD並為此創建一個新列。 除了正則表達式之外還有其他直接/簡單的方法嗎?
data = pd.DataFrame({"Name":["A","B","C"],"mail":
["a@gmail.com","b@yahoo.com","c@gmail.com"],"Adress":
["Adress1","Adress2","Adress3"],"Phone":["+91-1234567890","+88-
0987654321","+27-2647589201"]})
表
Name mail Adress Phone
A a@gmail.com Adress1 +91-1234567890
B b@yahoo.com Adress2 +88-0987654321
C c@gmail.com Adress3 +27-2647589201
預期結果: -
Name mail Adress Phone Mail_Provider ISD
A a@gmail.com Adress1 +91-1234567890 gmail 91
B b@yahoo.com Adress2 +88-0987654321 yahoo 88
C c@gmail.com Adress3 +27-2647589201 gmail 27
正則表達式相當簡單:
data['Mail_Provider'] = data['mail'].str.extract('\@(\w+)\.')
data['ISD'] = data['Phone'].str.extract('\+(\d+)-')
如果你真的想避免使用正則表達式,那么@ Eva的答案將是你要走的路。
lambda函數將起作用
data['Mail_Provider'] = data['mail'].apply(lambda x: x.split("@")[1].split(".")[0])
data['ISD'] = data['Phone'].apply(lambda x: x.split("+")[1].split("-")[0])
混合方法(正則表達式和簡單切片):
In [693]: df['Mail_Provider'] = df['mail'].str.extract('@([^.]+)')
In [694]: df['ISD'] = df['Phone'].str[1:3]
In [695]: df
Out[695]:
Name mail Adress Phone Mail_Provider ISD
0 A a@gmail.com Adress1 +91-1234567890 gmail 91
1 B b@yahoo.com Adress2 +88-0987654321 yahoo 88
2 C c@gmail.com Adress3 +27-2647589201 gmail 27
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.