使用递归Python计算项目在序列中出现的次数

Question

我正在尝试计算一个项目在序列中出现的次数，无论它是数字列表还是字符串，它都可以很好地用于数字，但是在字符串中查找类似"i"的字母时出现错误：

def Count(f,s):
    if s == []: 
        return 0
    while len(s) != 0:
        if f == s[0]:
            return 1 + Count(f,s[1:])
        else:
            return 0 + Count(f,s[1:])

TypeError：+不支持的操作数类型：“ int”和“ NoneType”

Answer 1

与使用递归相比，有一种更加惯用的方法：使用内置的count方法来计算出现次数。

def count(str, item):
    return str.count(item)

>>> count("122333444455555", "4")
4

但是，如果要通过迭代进行操作 ，则可以应用类似的原理。 将其转换为列表，然后遍历该列表。

def count(str, item):
    count = 0
    for character in list(str):
        if character == item:
            count += 1
    return count

Answer 2

问题是您的第一个if ，它将显式检查输入是否为空列表：

if s == []: 
    return 0

如果希望它与str和list使用，则应使用：

if not s:
    return s

简而言之，根据Python中的真值测试，任何空序列都被认为是假的，而任何非空序列都被认为是真。 如果您想了解更多信息，我添加了相关文档的链接。

您也可以while此处省略while循环，因为它是不必要的，因为它总是在第一次迭代中返回并因此离开循环。

因此，结果将是以下几行：

def count(f, s):
    if not s: 
        return 0
    elif f == s[0]:
        return 1 + count(f, s[1:])
    else:
        return 0 + count(f, s[1:])

例：

>>> count('i', 'what is it')
2

如果您不仅对使它起作用感兴趣，而且对使它变得更好感兴趣，那么有几种可能性。

整数的布尔值子类

在Python中，布尔值只是整数，因此在进行算术运算时它们的行为类似于整数：

>>> True + 0
1
>>> True + 1
2
>>> False + 0
0
>>> False + 1
1

所以，你可以很容易地内联if else ：

def count(f, s):
    if not s: 
        return 0
    return (f == s[0]) + count(f, s[1:])

因为f == s[0]如果相等，则返回True （其行为类似于1）；如果不相等，则返回False （其行为类似于0）。 括号不是必需的，但是为了清楚起见，我添加了它们。 并且由于基本情况始终返回整数，因此该函数本身将始终返回整数。

在递归方法中避免复制

由于以下原因，您的方法将创建很多输入副本：

s[1:]

这将创建整个列表（或字符串，...）的浅表副本（第一个元素除外）。 这意味着您实际上在每个函数调用中都有一个使用O(n) （其中n是元素数）的时间和内存的操作，并且由于您是递归地执行此操作，因此时间和内存的复杂度将为O(n**2) 。

您可以避免这些副本，例如，通过将索引传递给：

def _count_internal(needle, haystack, current_index):
    length = len(haystack)
    if current_index >= length:
        return 0
    found = haystack[current_index] == needle
    return found + _count_internal(needle, haystack, current_index + 1)

def count(needle, haystack):
    return _count_internal(needle, haystack, 0)

因为我需要传递当前索引，所以我添加了另一个采用该索引的函数（我假设您可能不希望在公共函数中传递该索引），但是如果您愿意，可以将其设为可选参数：

def count(needle, haystack, current_index=0):
    length = len(haystack)
    if current_index >= length:
        return 0

    return (haystack[current_index] == needle) + count(needle, haystack, current_index + 1)

但是，可能还有更好的方法。 您可以将序列转换为迭代器，并在内部使用该序列，在函数开始时，从迭代器中弹出下一个元素，如果没有元素，则结束递归，否则比较该元素，然后递归到其余的迭代器中：

def count(needle, haystack):
    # Convert it to an iterator, if it already
    # is an (well-behaved) iterator this is a no-op.
    haystack = iter(haystack)

    # Try to get the next item from the iterator
    try:
        item = next(haystack)
    except StopIteration:
        # No element remained
        return 0

    return (item == needle) + count(needle, haystack)

当然，如果要避免仅在第一次调用该函数时才需要的iter调用开销，也可以使用内部方法。 但是，这是微优化，可能不会明显加快执行速度：

def _count_internal(needle, haystack):
    try:
        item = next(haystack)
    except StopIteration:
        return 0

    return (item == needle) + _count_internal(needle, haystack)

def count(needle, haystack):
    return _count_internal(needle, iter(haystack))

这两种方法的优点是它们不使用（过多）额外的内存，并且可以避免复制。 因此，它应该更快并且占用更少的内存。

但是对于长序列，由于递归，您会遇到问题。 Python具有递归限制（可调整，但仅在一定程度上可以扩展）：

>>> count('a', 'a'*10000)
---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
<ipython-input-9-098dac093433> in <module>()
----> 1 count('a', 'a'*10000)

<ipython-input-5-5eb7a3fe48e8> in count(needle, haystack)
     11     else:
     12         add = 0
---> 13     return add + count(needle, haystack)

... last 1 frames repeated, from the frame below ...

<ipython-input-5-5eb7a3fe48e8> in count(needle, haystack)
     11     else:
     12         add = 0
---> 13     return add + count(needle, haystack)

RecursionError: maximum recursion depth exceeded in comparison

使用分治法进行递归

有许多方法可以缓解该问题（只要使用递归，就无法解决递归深度问题）。 经常使用的方法是分而治之。 基本上，这意味着您将具有的序列分为2个（有时更多）部分，并使用这些部分中的每个部分来调用函数。 仅剩下一项时，递归基石结束：

def count(needle, haystack):
    length = len(haystack)
    # No item
    if length == 0:
        return 0
    # Only one item remained
    if length == 1:
        # I used the long version here to avoid returning True/False for
        # length-1 sequences
        if needle == haystack[0]:
            return 1
        else:
            return 0

    # More than one item, split the sequence in
    # two parts and recurse on each of them
    mid = length // 2
    return count(needle, haystack[:mid]) + count(needle, haystack[mid:])

现在，递归深度从n更改为log(n) ，从而可以进行先前失败的调用：

>>> count('a', 'a'*10000)
10000

但是，由于使用切片，它将再次创建很多副本。 使用迭代器会很复杂（或不可能），因为迭代器没有大小（通常），但是使用索引很容易：

def _count_internal(needle, haystack, start_index, end_index):
    length = end_index - start_index
    if length == 0:
        return 0
    if length == 1:
        if needle == haystack[start_index]:
            return 1
        else:
            return 0

    mid = start_index + length // 2
    res1 = _count_internal(needle, haystack, start_index, mid)
    res2 = _count_internal(needle, haystack, mid, end_index)
    return res1 + res2

def count(needle, haystack):
    return _count_internal(needle, haystack, 0, len(haystack))

使用内置方法进行递归

在这种情况下，使用内置方法（或函数）似乎很愚蠢，因为已经有一个内置方法无需递归即可解决问题，但是在这里，它使用了字符串和列表都具有的index方法：

def count(needle, haystack):
    try:
        next_index = haystack.index(needle)
    except ValueError:  # the needle isn't present
        return 0

    return 1 + count(needle, haystack[next_index+1:])

使用迭代而不是递归

递归确实很强大，但是在Python中您必须克服递归限制，并且由于Python中没有尾部调用优化，因此它通常相当慢。 这可以通过使用迭代而不是递归来解决：

def count(needle, haystack):
    found = 0
    for item in haystack:
        if needle == item:
            found += 1
    return found

使用内置的迭代方法

如果您更具优势，还可以将生成器表达式与sum一起使用：

def count(needle, haystack):
    return sum(needle == item for item in haystack)

同样，这依赖于布尔值的行为类似于整数的事实，因此sum将所有出现次数（一个）与所有未出现次数（零） sum ，从而得出总数。

但是，如果已经在使用内置方法，那就不提内置方法（字符串和列表都具有）是可耻的： count ：

def count(needle, haystack):
    return haystack.count(needle)

到那时，您可能不再需要将其包装在一个函数中，而可以直接使用该方法。

如果您甚至想进一步计算所有元素，可以使用内置集合模块中的Counter ：

>>> from collections import Counter
>>> Counter('abcdab')
Counter({'a': 2, 'b': 2, 'c': 1, 'd': 1})

性能

我经常提到副本及其对内存和性能的影响，我实际上想提出一些定量结果以表明它确实有所作为。

我在这里使用了我的simple_benchmarks一个有趣的项目（它是第三方软件包，因此如果要运行它，则必须安装它）：

def count_original(f, s):
    if not s: 
        return 0
    elif f == s[0]:
        return 1 + count_original(f, s[1:])
    else:
        return 0 + count_original(f, s[1:])


def _count_index_internal(needle, haystack, current_index):
    length = len(haystack)
    if current_index >= length:
        return 0
    found = haystack[current_index] == needle
    return found + _count_index_internal(needle, haystack, current_index + 1)

def count_index(needle, haystack):
    return _count_index_internal(needle, haystack, 0)


def _count_iterator_internal(needle, haystack):
    try:
        item = next(haystack)
    except StopIteration:
        return 0

    return (item == needle) + _count_iterator_internal(needle, haystack)

def count_iterator(needle, haystack):
    return _count_iterator_internal(needle, iter(haystack))


def count_divide_conquer(needle, haystack):
    length = len(haystack)
    if length == 0:
        return 0
    if length == 1:
        if needle == haystack[0]:
            return 1
        else:
            return 0
    mid = length // 2
    return count_divide_conquer(needle, haystack[:mid]) + count_divide_conquer(needle, haystack[mid:])


def _count_divide_conquer_index_internal(needle, haystack, start_index, end_index):
    length = end_index - start_index
    if length == 0:
        return 0
    if length == 1:
        if needle == haystack[start_index]:
            return 1
        else:
            return 0

    mid = start_index + length // 2
    res1 = _count_divide_conquer_index_internal(needle, haystack, start_index, mid)
    res2 = _count_divide_conquer_index_internal(needle, haystack, mid, end_index)
    return res1 + res2

def count_divide_conquer_index(needle, haystack):
    return _count_divide_conquer_index_internal(needle, haystack, 0, len(haystack))


def count_index_method(needle, haystack):
    try:
        next_index = haystack.index(needle)
    except ValueError:  # the needle isn't present
        return 0

    return 1 + count_index_method(needle, haystack[next_index+1:])


def count_loop(needle, haystack):
    found = 0
    for item in haystack:
        if needle == item:
            found += 1
    return found


def count_sum(needle, haystack):
    return sum(needle == item for item in haystack)


def count_method(needle, haystack):
    return haystack.count(needle)

import random
import string
from functools import partial
from simple_benchmark import benchmark, MultiArgument

funcs = [count_divide_conquer, count_divide_conquer_index, count_index, count_index_method, count_iterator, count_loop,
         count_method, count_original, count_sum]
# Only recursive approaches without builtins
# funcs = [count_divide_conquer, count_divide_conquer_index, count_index, count_iterator, count_original]
arguments = {
    2**i: MultiArgument(('a', [random.choice(string.ascii_lowercase) for _ in range(2**i)]))
    for i in range(1, 12)
}
b = benchmark(funcs, arguments, 'size')

b.plot()

它的对数-对数比例缩放以有意义的方式显示值的范围，而较低的则意味着更快。

一个可以清楚地看到，对于长输入，原始方法变得非常慢（因为它复制了O(n**2)执行的列表），而另一种方法则表现为线性。 似乎很奇怪的是，分治法的执行速度较慢，但这是因为这些方法需要更多的函数调用（而函数调用在Python中很昂贵）。 但是，在达到递归限制之前，它们可以处理比迭代器和索引变体更长的输入。

更改分而治之的方法很容易，以便它运行得更快，想到了以下几种可能性：

序列短时切换到非分而治之。
始终在每个函数调用中处理一个元素，而只划分其余序列。

但是鉴于这可能只是递归练习，因此超出了范围。

但是，它们的性能都比使用迭代方法差得多：

特别是使用列表的count方法（还包括字符串之一）和手动迭代要快得多。

Answer 3

该错误是因为有时您只是没有返回值。 因此，在函数末尾返回0可以解决此错误。 在python中有很多更好的方法可以做到这一点，但是我认为这只是为了训练递归编程。

Answer 4

我认为您的工作方式很艰难。

您可以使用集合中的计数器执行相同的操作。

from collections import Counter

def count(f, s):
    if s == None:
        return 0
    return Counter(s).get(f)

Counter将返回一个dict对象，该对象保存s对象中所有内容的计数。 在dict对象上执行.get（f）将返回您要搜索的特定项目的计数。 这适用于数字列表或字符串。

Answer 5

如果您受约束并决心使用递归进行处理，那么我强烈建议您将问题减半，而不是一一逐一解决。 减半允许您处理大得多的情况，而不会遇到堆栈溢出的情况。

def count(f, s):
    l = len(s)
    if l > 1:
        mid = l / 2 
        return count(f, s[:mid]) + count(f, s[mid:])
    elif l == 1 and s[0] == f:
        return 1
    return 0

使用递归Python计算项目在序列中出现的次数

问题描述

5 个解决方案

解决方案1
2 2016-03-09 15:29:25

解决方案2
1 已采纳 2016-03-09 15:30:44

整数的布尔值子类

在递归方法中避免复制

使用分治法进行递归

使用内置方法进行递归

使用迭代而不是递归

使用内置的迭代方法

性能

解决方案3
1 2016-03-09 15:34:28

解决方案4
0 2016-03-09 15:50:30

解决方案5
0 2016-03-09 18:23:24

使用递归Python计算项目在序列中出现的次数

问题描述

5 个解决方案

解决方案1 2 2016-03-09 15:29:25

解决方案2 1 已采纳 2016-03-09 15:30:44

整数的布尔值子类

在递归方法中避免复制

使用分治法进行递归

使用内置方法进行递归

使用迭代而不是递归

使用内置的迭代方法

性能

解决方案3 1 2016-03-09 15:34:28

解决方案4 0 2016-03-09 15:50:30

解决方案5 0 2016-03-09 18:23:24

解决方案1
2 2016-03-09 15:29:25

解决方案2
1 已采纳 2016-03-09 15:30:44

解决方案3
1 2016-03-09 15:34:28

解决方案4
0 2016-03-09 15:50:30

解决方案5
0 2016-03-09 18:23:24