查找字符串中第 n 次出現的 substring

Question

這看起來應該是微不足道的，但我是 Python 的新手，想以最 Pythonic 的方式來做。

我想找到與字符串中第 n 次出現的 substring 對應的索引。

必須有一些等同於我想做的事情

mystring.find("substring", 2nd)

你怎么能在 Python 中做到這一點？

Answer 1

這是直接迭代解決方案的更 Pythonic 版本：

def find_nth(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+len(needle))
        n -= 1
    return start

例子：

>>> find_nth("foofoofoofoo", "foofoo", 2)
6

如果您想找到第 n 個重疊出現的needle ，您可以增加1而不是len(needle) ，如下所示：

def find_nth_overlapping(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+1)
        n -= 1
    return start

例子：

>>> find_nth_overlapping("foofoofoofoo", "foofoo", 2)
3

這比Mark的版本更容易閱讀，並且不需要拆分版本或導入正則表達式模塊的額外內存。 與各種re方法不同，它還遵循Python之Zen 中的一些規則：

簡單勝於復雜。
扁平比嵌套好。
可讀性很重要。

Answer 2

我認為，馬克的迭代方法將是通常的方法。

這是字符串拆分的替代方法，它通常對查找相關過程很有用：

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

這是一個快速的（有點臟，因為你必須選擇一些與針頭不匹配的箔條）單線：

'foo bar bar bar'.replace('bar', 'XXX', 1).find('bar')

Answer 3

這將在字符串中找到第二次出現的子字符串。

def find_2nd(string, substring):
   return string.find(substring, string.find(substring) + 1)

編輯：我對性能沒有考慮太多，但是快速遞歸可以幫助找到第 n 個出現：

def find_nth(string, substring, n):
   if (n == 1):
       return string.find(substring)
   else:
       return string.find(substring, find_nth(string, substring, n - 1) + 1)

Answer 4

了解正則表達式並不總是最好的解決方案，我可能會在這里使用一個：

>>> import re
>>> s = "ababdfegtduab"
>>> [m.start() for m in re.finditer(r"ab",s)]
[0, 2, 11]
>>> [m.start() for m in re.finditer(r"ab",s)][2] #index 2 is third occurrence 
11

Answer 5

我提供了一些基准測試結果，比較了迄今為止提出的最突出的方法，即 @bobince 的findnth() （基於str.split() ）與 @tgamblin 或 @Mark Byers 的find_nth() （基於str.find() )。 我還將與 C 擴展 ( _find_nth.so ) 進行比較，看看我們能走多快。 這是find_nth.py ：

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

def find_nth(s, x, n=0, overlap=False):
    l = 1 if overlap else len(x)
    i = -l
    for c in xrange(n + 1):
        i = s.find(x, i + l)
        if i < 0:
            break
    return i

當然，如果字符串很大，性能最重要，因此假設我們要在名為“bigfile”的 1.3 GB 文件中找到第 1000001 個換行符 ('\\n')。 為了節省內存，我們想處理文件的mmap.mmap對象表示：

In [1]: import _find_nth, find_nth, mmap

In [2]: f = open('bigfile', 'r')

In [3]: mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

findnth()已經存在第一個問題，因為mmap.mmap對象不支持split() 。 所以我們實際上必須將整個文件復制到內存中：

In [4]: %time s = mm[:]
CPU times: user 813 ms, sys: 3.25 s, total: 4.06 s
Wall time: 17.7 s

哎喲! 幸運的是s仍然適合我的 Macbook Air 的 4 GB 內存，所以讓我們對findnth()基准測試：

In [5]: %timeit find_nth.findnth(s, '\n', 1000000)
1 loops, best of 3: 29.9 s per loop

顯然是糟糕的表現。 讓我們看看基於str.find()的方法是如何做的：

In [6]: %timeit find_nth.find_nth(s, '\n', 1000000)
1 loops, best of 3: 774 ms per loop

好多了！ 顯然， findnth()的問題在於它在split()期間被迫復制字符串，這已經是我們在s = mm[:]之后第二次復制 1.3 GB 的數據。 這是find_nth()的第二個優點：我們可以直接在mm上使用它，這樣就需要文件的零副本：

In [7]: %timeit find_nth.find_nth(mm, '\n', 1000000)
1 loops, best of 3: 1.21 s per loop

在mm與s上運行時似乎有一個小的性能損失，但這說明find_nth()可以在 1.2 秒內為我們提供答案，而findnth的總時間為 47 秒。

我沒有發現基於str.find()的方法明顯比基於str.find()的方法差的str.split() ，所以在這一點上，我認為應該接受 @tgamblin 或 @Mark Byers 的答案而不是 @bobince 的答案.

在我的測試中，上面的find_nth()版本是我能想到的最快的純 Python 解決方案（與 @Mark Byers 的版本非常相似）。 讓我們看看我們可以用 C 擴展模塊做得多好。 這是_find_nthmodule.c ：

#include <Python.h>
#include <string.h>

off_t _find_nth(const char *buf, size_t l, char c, int n) {
    off_t i;
    for (i = 0; i < l; ++i) {
        if (buf[i] == c && n-- == 0) {
            return i;
        }
    }
    return -1;
}

off_t _find_nth2(const char *buf, size_t l, char c, int n) {
    const char *b = buf - 1;
    do {
        b = memchr(b + 1, c, l);
        if (!b) return -1;
    } while (n--);
    return b - buf;
}

/* mmap_object is private in mmapmodule.c - replicate beginning here */
typedef struct {
    PyObject_HEAD
    char *data;
    size_t size;
} mmap_object;

typedef struct {
    const char *s;
    size_t l;
    char c;
    int n;
} params;

int parse_args(PyObject *args, params *P) {
    PyObject *obj;
    const char *x;

    if (!PyArg_ParseTuple(args, "Osi", &obj, &x, &P->n)) {
        return 1;
    }
    PyTypeObject *type = Py_TYPE(obj);

    if (type == &PyString_Type) {
        P->s = PyString_AS_STRING(obj);
        P->l = PyString_GET_SIZE(obj);
    } else if (!strcmp(type->tp_name, "mmap.mmap")) {
        mmap_object *m_obj = (mmap_object*) obj;
        P->s = m_obj->data;
        P->l = m_obj->size;
    } else {
        PyErr_SetString(PyExc_TypeError, "Cannot obtain char * from argument 0");
        return 1;
    }
    P->c = x[0];
    return 0;
}

static PyObject* py_find_nth(PyObject *self, PyObject *args) {
    params P;
    if (!parse_args(args, &P)) {
        return Py_BuildValue("i", _find_nth(P.s, P.l, P.c, P.n));
    } else {
        return NULL;    
    }
}

static PyObject* py_find_nth2(PyObject *self, PyObject *args) {
    params P;
    if (!parse_args(args, &P)) {
        return Py_BuildValue("i", _find_nth2(P.s, P.l, P.c, P.n));
    } else {
        return NULL;    
    }
}

static PyMethodDef methods[] = {
    {"find_nth", py_find_nth, METH_VARARGS, ""},
    {"find_nth2", py_find_nth2, METH_VARARGS, ""},
    {0}
};

PyMODINIT_FUNC init_find_nth(void) {
    Py_InitModule("_find_nth", methods);
}

這是setup.py文件：

from distutils.core import setup, Extension
module = Extension('_find_nth', sources=['_find_nthmodule.c'])
setup(ext_modules=[module])

像往常一樣使用python setup.py install 。 C 代碼在這里發揮了優勢，因為它僅限於查找單個字符，但讓我們看看這有多快：

In [8]: %timeit _find_nth.find_nth(mm, '\n', 1000000)
1 loops, best of 3: 218 ms per loop

In [9]: %timeit _find_nth.find_nth(s, '\n', 1000000)
1 loops, best of 3: 216 ms per loop

In [10]: %timeit _find_nth.find_nth2(mm, '\n', 1000000)
1 loops, best of 3: 307 ms per loop

In [11]: %timeit _find_nth.find_nth2(s, '\n', 1000000)
1 loops, best of 3: 304 ms per loop

顯然還要快一些。 有趣的是，in-memory 和 mmapped 情況在 C 級別上沒有區別。 這也是有趣的是， _find_nth2()它是基於string.h的memchr()庫函數，失去了對直接實現在_find_nth()在附加的‘優化’ memchr()顯然事與願違。 ..

總之， findnth()的實現（基於str.split() ）確實是一個壞主意，因為（a）由於需要復制，它對於較大的字符串執行得非常糟糕，並且（b）它不適用於mmap.mmap對象。 find_nth()的實現（基於str.find() ）在所有情況下都應該是首選（因此是這個問題的公認答案）。

仍有相當大的改進空間，因為 C 擴展的運行速度幾乎比純 Python 代碼快 4 倍，這表明可能需要專用的 Python 庫函數。

Answer 6

最簡單的方法？

text = "This is a test from a test ok" 

firstTest = text.find('test')

print text.find('test', firstTest + 1)

Answer 7

我可能會做這樣的事情，使用帶有索引參數的 find 函數：

def find_nth(s, x, n):
    i = -1
    for _ in range(n):
        i = s.find(x, i + len(x))
        if i == -1:
            break
    return i

print find_nth('bananabanana', 'an', 3)

我猜它不是特別 Pythonic，但它很簡單。 你可以使用遞歸來代替：

def find_nth(s, x, n, i = 0):
    i = s.find(x, i)
    if n == 1 or i == -1:
        return i 
    else:
        return find_nth(s, x, n - 1, i + len(x))

print find_nth('bananabanana', 'an', 3)

這是一種解決它的實用方法，但我不知道這是否使它更像 Pythonic。

Answer 8

這將為您提供與yourstring匹配的起始索引數組：

import re
indices = [s.start() for s in re.finditer(':', yourstring)]

那么你的第 n 個條目將是：

n = 2
nth_entry = indices[n-1]

當然，您必須小心索引邊界。 您可以像這樣獲取yourstring的實例數：

num_instances = len(indices)

Answer 9

這是使用 re.finditer 的另一種方法。
不同之處在於，這只會在必要時查看大海撈針

from re import finditer
from itertools import dropwhile
needle='an'
haystack='bananabanana'
n=2
next(dropwhile(lambda x: x[0]<n, enumerate(re.finditer(needle,haystack))))[1].start()

Answer 10

這是在搜索str或RegexpObject時應該工作的另一個re + itertools版本。 我會坦率地承認這可能是過度設計的，但出於某種原因，它讓我很開心。

import itertools
import re

def find_nth(haystack, needle, n = 1):
    """
    Find the starting index of the nth occurrence of ``needle`` in \
    ``haystack``.

    If ``needle`` is a ``str``, this will perform an exact substring
    match; if it is a ``RegexpObject``, this will perform a regex
    search.

    If ``needle`` doesn't appear in ``haystack``, return ``-1``. If
    ``needle`` doesn't appear in ``haystack`` ``n`` times,
    return ``-1``.

    Arguments
    ---------
    * ``needle`` the substring (or a ``RegexpObject``) to find
    * ``haystack`` is a ``str``
    * an ``int`` indicating which occurrence to find; defaults to ``1``

    >>> find_nth("foo", "o", 1)
    1
    >>> find_nth("foo", "o", 2)
    2
    >>> find_nth("foo", "o", 3)
    -1
    >>> find_nth("foo", "b")
    -1
    >>> import re
    >>> either_o = re.compile("[oO]")
    >>> find_nth("foo", either_o, 1)
    1
    >>> find_nth("FOO", either_o, 1)
    1
    """
    if (hasattr(needle, 'finditer')):
        matches = needle.finditer(haystack)
    else:
        matches = re.finditer(re.escape(needle), haystack)
    start_here = itertools.dropwhile(lambda x: x[0] < n, enumerate(matches, 1))
    try:
        return next(start_here)[1].start()
    except StopIteration:
        return -1

Answer 11

建立在modle13的答案上，但沒有re模塊依賴性。

def iter_find(haystack, needle):
    return [i for i in range(0, len(haystack)) if haystack[i:].startswith(needle)]

我有點希望這是一個內置的字符串方法。

>>> iter_find("http://stackoverflow.com/questions/1883980/", '/')
[5, 6, 24, 34, 42]

Answer 12

對於搜索字符第 n 次出現的特殊情況（即長度為 1 的子字符串），以下函數通過構建給定字符的所有出現位置的列表來工作：

def find_char_nth(string, char, n):
    """Find the n'th occurence of a character within a string."""
    return [i for i, c in enumerate(string) if c == char][n-1]

如果給定字符的出現次數少於n ，則會給出IndexError: list index out of range 。

這源自@Zv_oDD 的答案，並針對單個字符的情況進行了簡化。

Answer 13

>>> s="abcdefabcdefababcdef"
>>> j=0
>>> for n,i in enumerate(s):
...   if s[n:n+2] =="ab":
...     print n,i
...     j=j+1
...     if j==2: print "2nd occurence at index position: ",n
...
0 a
6 a
2nd occurence at index position:  6
12 a
14 a

Answer 14

提供另一個“棘手”的解決方案，它使用split和join 。

在您的示例中，我們可以使用

len("substring".join([s for s in ori.split("substring")[:2]]))

Answer 15

# return -1 if nth substr (0-indexed) d.n.e, else return index
def find_nth(s, substr, n):
    i = 0
    while n >= 0:
        n -= 1
        i = s.find(substr, i + 1)
    return i

Answer 16

不使用循環和遞歸的解決方案。

在 compile 方法中使用所需的模式並在變量'n' 中輸入所需的出現次數，最后一條語句將打印給定字符串中該模式第 n 次出現的起始索引。 這里 finditer 的結果即 iterator 被轉換為 list 並直接訪問第 n 個索引。

import re
n=2
sampleString="this is history"
pattern=re.compile("is")
matches=pattern.finditer(sampleString)
print(list(matches)[n].span()[0])

Answer 17

這是我在字符串a找到第n次出現b解決方案：

from functools import reduce


def findNth(a, b, n):
    return reduce(lambda x, y: -1 if y > x + 1 else a.find(b, x + 1), range(n), -1)

它是純 Python 和迭代的。 對於過大的 0 或n ，它返回 -1。 它是單行的，可以直接使用。 下面是一個例子：

>>> reduce(lambda x, y: -1 if y > x + 1 else 'bibarbobaobaotang'.find('b', x + 1), range(4), -1)
7

Answer 18

更換一個襯墊很棒，但只能起作用，因為 XX 和 bar 具有相同的長度

一個好的和通用的定義是：

def findN(s,sub,N,replaceString="XXX"):
    return s.replace(sub,replaceString,N-1).find(sub) - (len(replaceString)-len(sub))*(N-1)

Answer 19

定義：

def get_first_N_words(mytext, mylen = 3):
    mylist = list(mytext.split())
    if len(mylist)>=mylen: return ' '.join(mylist[:mylen])

使用：

get_first_N_words('  One Two Three Four ' , 3)

輸出：

'One Two Three'

Answer 20

當提供的出現輸入值高於實際出現次數時，避免出現故障或錯誤輸出。 例如，在字符串 'overflow' 中，如果您要檢查 'o' 的第 3 次出現（它只有 2 次出現），那么下面的代碼將返回一個警告或消息，指示已超出出現值。

輸入的出現次數超過了實際的出現次數。

def check_nth_occurrence (string, substr, n):

## Count the Occurrence of a substr
    cnt = 0
    for i in string:
        if i ==substr:
            cnt = cnt + 1
        else:
            pass

## Check if the Occurrence input has exceeded the actual count of Occurrence

    if n > cnt:
        print (f' Input Occurrence entered has exceeded the actual count of Occurrence')
        return

## Get the Index value for first Occurrence of the substr

   index = string.find(substr)

## Get the Index value for nth Occurrence of Index
    while index >= 0 and n > 1:
        index = string.find(substr, index+ 1)
        n -= 1
  return index

Answer 21

這是一個簡單而有趣的方法：

def index_of_nth(text, substring, n) -> int:
    index = 0
    for _ in range(n):
        index = text.index(substring, index) + 1
    return index - 1

Answer 22

以防萬一有人想從后面找到第 n 個：

def find_nth_reverse(haystack: str, needle: str, n: int) -> int:
    end = haystack.rfind(needle)

    while end >= 0 and n > 1:
        end = haystack.rfind(needle, 0, end - len(needle))
        n -= 1

    return end

Answer 23

我使用了 findnth() 函數並遇到了一些問題，所以我重寫了一個更快的函數版本（沒有列表拆分）：

def findnth(haystack, needle, n):
    if not needle in haystack or haystack.count(needle) < n:
        return -1

    last_index = 0
    cumulative_last_index = 0
    for i in range(0, n):
        last_index = haystack[cumulative_last_index:].find(needle)
        cumulative_last_index += last_index
        
        # if not last element, then jump over it
        if i < n-1:
            cumulative_last_index += len(needle)

    return cumulative_last_index

Answer 24

我是這樣解決的。

def second_index(text: str, symbol: str) -> [int, None]:
"""
    returns the second index of a symbol in a given text
"""
first = text.find(symbol)
result = text.find(symbol,first+1)
if result > 0: return result

Answer 25

這是您真正想要的答案：

def Find(String,ToFind,Occurence = 1):
index = 0 
count = 0
while index <= len(String):
    try:
        if String[index:index + len(ToFind)] == ToFind:
            count += 1
        if count == Occurence:
               return index
               break
        index += 1
    except IndexError:
        return False
        break
return False

Answer 26

對於具有基本編程知識的人來說，一個簡單的解決方案：

# Function to find the nth occurrence of a substring in a text
def findnth(text, substring, n):

# variable to store current index in loop
count = -1

# n count
occurance = 0

# loop through string
for letter in text:
    
    # increment count
    count += 1
    
    # if current letter in loop matches substring target
    if letter == substring:
        
        # increment occurance
        occurance += 1
        
        # if this is the nth time the substring is found
        if occurance == n:
            
            # return its index
            return count
        
# otherwise indicate there is no match
return "No match"

# example of how to call function
print(findnth('C$100$150xx', "$", 2))

Answer 27

怎么樣：

c = os.getcwd().split('\\')
print '\\'.join(c[0:-2])

查找字符串中第 n 次出現的 substring

問題描述

27 個解決方案

解決方案1
94 2009-12-10 21:45:22

解決方案2
88 已采納 2009-12-10 21:26:39

解決方案3
40 2012-10-26 20:59:02

解決方案4
32 2009-12-10 21:36:42

解決方案5
19 2014-05-05 18:16:34

解決方案6
11 2015-09-02 15:32:41

解決方案7
8 2009-12-10 21:14:24

解決方案8
4 2017-01-13 02:19:03

解決方案9
2 2009-12-10 21:45:18

解決方案10
2 2009-12-11 15:06:23

解決方案11
2 2017-04-09 00:06:00

解決方案12
2 2019-11-21 01:08:56

解決方案13
1 2009-12-11 00:22:29

解決方案14
1 2015-03-31 05:40:02

解決方案15
1 2018-01-17 21:36:32

解決方案16
1 2019-06-20 11:36:54

解決方案17
1 2019-07-15 21:10:07

解決方案18
0 2013-04-17 22:53:29

解決方案19
0 2020-01-06 20:12:49

解決方案20
0 2020-11-20 19:03:21

輸入的出現次數超過了實際的出現次數。

解決方案21
0 2021-05-10 15:16:57

解決方案22
0 2021-05-13 05:04:22

解決方案23
0 2021-08-28 16:00:21

解決方案24
0 2022-04-16 10:18:14

解決方案25
-1 2016-07-19 18:53:32

解決方案26
-1 2021-10-03 00:51:07

解決方案27
-3 2016-06-13 16:01:08

查找字符串中第 n 次出現的 substring

問題描述

27 個解決方案

解決方案1 94 2009-12-10 21:45:22

解決方案2 88 已采納 2009-12-10 21:26:39

解決方案3 40 2012-10-26 20:59:02

解決方案4 32 2009-12-10 21:36:42

解決方案5 19 2014-05-05 18:16:34

解決方案6 11 2015-09-02 15:32:41

解決方案7 8 2009-12-10 21:14:24

解決方案8 4 2017-01-13 02:19:03

解決方案9 2 2009-12-10 21:45:18

解決方案10 2 2009-12-11 15:06:23

解決方案11 2 2017-04-09 00:06:00

解決方案12 2 2019-11-21 01:08:56

解決方案13 1 2009-12-11 00:22:29

解決方案14 1 2015-03-31 05:40:02

解決方案15 1 2018-01-17 21:36:32

解決方案16 1 2019-06-20 11:36:54

解決方案17 1 2019-07-15 21:10:07

解決方案18 0 2013-04-17 22:53:29

解決方案19 0 2020-01-06 20:12:49

解決方案20 0 2020-11-20 19:03:21

輸入的出現次數超過了實際的出現次數。

解決方案21 0 2021-05-10 15:16:57

解決方案22 0 2021-05-13 05:04:22

解決方案23 0 2021-08-28 16:00:21

解決方案24 0 2022-04-16 10:18:14

解決方案25 -1 2016-07-19 18:53:32

解決方案26 -1 2021-10-03 00:51:07

解決方案27 -3 2016-06-13 16:01:08

解決方案1
94 2009-12-10 21:45:22

解決方案2
88 已采納 2009-12-10 21:26:39

解決方案3
40 2012-10-26 20:59:02

解決方案4
32 2009-12-10 21:36:42

解決方案5
19 2014-05-05 18:16:34

解決方案6
11 2015-09-02 15:32:41

解決方案7
8 2009-12-10 21:14:24

解決方案8
4 2017-01-13 02:19:03

解決方案9
2 2009-12-10 21:45:18

解決方案10
2 2009-12-11 15:06:23

解決方案11
2 2017-04-09 00:06:00

解決方案12
2 2019-11-21 01:08:56

解決方案13
1 2009-12-11 00:22:29

解決方案14
1 2015-03-31 05:40:02

解決方案15
1 2018-01-17 21:36:32

解決方案16
1 2019-06-20 11:36:54

解決方案17
1 2019-07-15 21:10:07

解決方案18
0 2013-04-17 22:53:29

解決方案19
0 2020-01-06 20:12:49

解決方案20
0 2020-11-20 19:03:21

解決方案21
0 2021-05-10 15:16:57

解決方案22
0 2021-05-13 05:04:22

解決方案23
0 2021-08-28 16:00:21

解決方案24
0 2022-04-16 10:18:14

解決方案25
-1 2016-07-19 18:53:32

解決方案26
-1 2021-10-03 00:51:07

解決方案27
-3 2016-06-13 16:01:08