從另一列中特定字符的索引在Pandas數據框中創建新列

Question

我有一個Pandas數據框，其中包含一列電子郵件：

Email
kitty@gmail.com
cat@yahoo.com
dog@aol.com
person@hrc.com

此列/系列的數據類型為UNICODE，對於我嘗試執行的操作似乎效果不佳。 我想得到的只是一列以str格式的電子郵件域，如下所示：

Domain
gmail.com
yahoo.com
aol.com
hrc.com

我嘗試這樣做：

df['domain'] = df['Email'][:,df['Email'].find('@'):]

但出現屬性錯誤：“系列”對象沒有屬性“查找”。

我在整個堆棧溢出中進行了搜索，但是只找到了基於單個選定整數選擇子字符串的方法，但是這種情況不起作用，因為在每個實例中'@'的位置都不同。 我真的很想避免使用for循環來做到這一點。 有人知道實現此目的的簡單方法嗎？ 我相信UNICODE數據類型可能會造成干擾。

編輯：當我在單獨的環境（iPython）中創建樣本數據時，@ rojeeer提供的解決方案效果很好，但是在Databricks（Python 2.7.10）的表上使用它時，我不斷遇到錯誤：

TypeError: split() got an unexpected keyword argument 'return_type'
TypeError: split() got an unexpected keyword argument 'expand'

我相信這是由於我表中的數據被編碼為Unicode（或未編碼）這一事實。 我嘗試了幾種方法將其轉換為str：

df[‘email’] = df[‘email’].map(lambda x: x.encode("utf-8"))
df[‘email’] = df[‘email’].encode("utf-8")

我還嘗試通過嘗試以下方法來規范化數據：

import unicodedata as ucd
df[‘email’] = ucd.normalize('NFKD', df[‘email’])

import unicodedata as ucd
df[‘email’] = ucd.normalize('NFKD', df[‘email’]).encode(‘ascii’,’ignore’)

import unicodedata as ucd

df[‘email’]= df[‘email’].map(lambda x: ucd.normalize('NFKD', x))

這些不斷返回錯誤：

AttributeError: 'NoneType' object has no attribute 'encode'
TypeError: must be unicode, not None

如何將該系列轉換為str？

Answer 1

在Pandas中，無法直接調用str函數，您需要將其稱為df.str.function ，請參閱使用text 。

對於您的應用程序，我認為可以選擇兩個函數： str.split和str.extract 。 該str.split的Series是非常相似的split的str ，通過正規快件分隔符拆分字符串。 雖然str.extract可以更強str.extract指定如何提取。

這是一個示例代碼：

In [16]: df['Email'].str.split('@', expand=True)
Out[16]: 
        0          1
0   kitty  gmail.com
1     cat  yahoo.com
2     dog    aol.com
3  person    hrc.com

設置expand=True會將Series擴展為DataFrame其中包含與拆分結果長度一樣多的列。

希望這會有所幫助。

從另一列中特定字符的索引在Pandas數據框中創建新列

問題描述

1 個解決方案

解決方案1
0 2016-11-04 22:33:51

從另一列中特定字符的索引在Pandas數據框中創建新列

問題描述

1 個解決方案

解決方案1 0 2016-11-04 22:33:51

解決方案1
0 2016-11-04 22:33:51