熊貓str.count

Question

請考慮以下數據幀。 我想計算一個字符串中出現的'$'的數量。 我在pandas中使用str.count函數（ http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.count.html ）。

>>> import pandas as pd
>>> df = pd.DataFrame(['$$a', '$$b', '$c'], columns=['A'])
>>> df['A'].str.count('$')
0    1
1    1
2    1
Name: A, dtype: int64

我期待結果是[2,2,1] 。 我究竟做錯了什么？

在Python中，字符串模塊中的count函數返回正確的結果。

>>> a = "$$$$abcd"
>>> a.count('$')
4
>>> a = '$abcd$dsf$'
>>> a.count('$')
3

Answer 1

$在RegEx中有特殊含義 - 它是行尾，所以試試這個：

In [21]: df.A.str.count(r'\$')
Out[21]:
0    2
1    2
2    1
Name: A, dtype: int64

Answer 2

正如其他答案所指出的那樣，這里的問題是$表示該行的結束。 如果您不打算使用正則表達式，您可能會發現使用str.count （即內置類型str ）比它的pandas對應更快;

In [39]: df['A'].apply(lambda x: x.count('$'))
Out[39]: 
0    2
1    2
2    1
Name: A, dtype: int64

In [40]: %timeit df['A'].str.count(r'\$')
1000 loops, best of 3: 243 µs per loop

In [41]: %timeit df['A'].apply(lambda x: x.count('$'))
1000 loops, best of 3: 202 µs per loop

Answer 3

嘗試模式[$]所以它不會將$視為字符的結尾（請參閱此cheatsheet ），如果將其放在方括號[]則將其視為文字字符：

In [3]:
df = pd.DataFrame(['$$a', '$$b', '$c'], columns=['A'])
df['A'].str.count('[$]')

Out[3]:
0    2
1    2
2    1
Name: A, dtype: int64

Answer 4

從@fuglede那里得到啟示

pd.Series([x.count('$') for x in df.A.values.tolist()], df.index)

正如@jezrael指出的那樣，當存在null類型時，上述操作失敗，所以......

def tc(x):
    try:
        return x.count('$')
    except:
        return 0

pd.Series([tc(x) for x in df.A.values.tolist()], df.index)

計時

np.random.seed([3,1415])
df = pd.Series(np.random.randint(0, 100, 100000)) \
       .apply(lambda x: '\$' * x).to_frame('A')

df.A.replace('', np.nan, inplace=True)

def tc(x):
    try:
        return x.count('$')
    except:
        return 0

熊貓str.count

問題描述

4 個解決方案

解決方案1
5 已采納 2016-11-29 21:00:09

解決方案2
3 2016-11-29 21:06:32

解決方案3
2 2016-11-29 21:01:09

解決方案4
1 2016-11-29 21:16:22

熊貓str.count

問題描述

4 個解決方案

解決方案1 5 已采納 2016-11-29 21:00:09

解決方案2 3 2016-11-29 21:06:32

解決方案3 2 2016-11-29 21:01:09

解決方案4 1 2016-11-29 21:16:22

解決方案1
5 已采納 2016-11-29 21:00:09

解決方案2
3 2016-11-29 21:06:32

解決方案3
2 2016-11-29 21:01:09

解決方案4
1 2016-11-29 21:16:22