[英]Pandas str.count()
我有一個包含2列的數據框,我正在嘗試創建第三列,計算第二列中第一列的出現次數。
sample_df =
Object Text
Banana Banana Banana Banana
Banana Apple Apple Apple
Apple Banana Apple
現在我正在嘗試以下代碼:
sample_df['Mentions'] = sample_df['Text'].count(sample_df['Object'])
產生以下錯誤:
AttributeErrorTraceback (most recent call last)
<ipython-input-65-c9ae4ce28088> in <module>()
----> 1 sample_df['Mentions'] = sample_df['Text'].count(sample_df['Object'])
/usr/local/lib/python2.7/dist-packages/pandas/core/series.pyc in count(self,
level)
1177 level = self.index._get_level_number(level)
1178
-> 1179 lev = self.index.levels[level]
1180 lab = np.array(self.index.labels[level], subok=False, copy=True)
1181
AttributeError: 'RangeIndex' object has no attribute 'levels'
如果您閱讀pd.Series.count
的文檔,您將看到它沒有按照您的想法執行:
Series.count(level=None)
返回系列中非NA / null觀測值的返回數
您已經提供了一個pandas Series作為級別參數,這是無效的,這就是您收到錯誤的原因。 為了您的使用,請嘗試以下方法:
df['counter'] = df.apply(lambda x: x.Text.count(x.Object), axis=1)
Object Text counter
0 Banana Banana Banana Banana 3
1 Banana Apple Apple Apple 0
2 Apple Banana Apple 1
如果你關心性能,你也可以在這里使用一個簡單的列表理解:
df['counter'] = [i.count(j) for i, j in zip(df.Text, df.Object)]
計時(使用列表理解:D)
df = pd.concat([df]*10000)
%timeit df.apply(lambda x: x.Text.count(x.Object), axis=1)
1.14 s ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [i.count(j) for i, j in zip(df.Text, df.Object)]
6.71 ms ± 25 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
from collections import Counter
def count(row):
temp=row[1].split(' ')
d=Counter(temp)
return d[row[0]]
df['Mentions']=df.apply(lambda x: count(x),axis=1)
print(df)
Object Text Mentions
0 Banana Banana Banana Banana 3
1 Banana Apple Apple Apple 0
2 Apple Banana Apple 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.