如何查找所有出現的子字符串？

Question

Python 有string.find()和string.rfind()來獲取字符串中子字符串的索引。

我想知道是否有類似string.find_all()的東西可以返回所有找到的索引（不僅是從頭開始的第一個或從結尾開始的第一個）。

例如：

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

_{要計算出現次數，請參閱計算字符串中子字符串的出現次數。}

Answer 1

沒有簡單的內置字符串函數可以滿足您的需求，但您可以使用更強大的正則表達式：

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

如果你想找到重疊的匹配， lookahead會這樣做：

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

如果你想要一個沒有重疊的反向查找，你可以將正負前瞻組合成這樣的表達式：

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer返回一個生成器，因此您可以將上面的[]更改為()以獲取生成器而不是列表，如果您只遍歷結果一次，這將更有效。

Answer 2

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

因此，我們可以自己構建它：

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

不需要臨時字符串或正則表達式。

Answer 3

使用re.finditer ：

import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
    print (match.start(), match.end())

對於word = "this"和sentence = "this is a sentence this this"這將產生 output：

(0, 4)
(19, 23)
(24, 28)

Answer 4

這是一種（非常低效）獲取所有（即甚至重疊）匹配的方法：

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

Answer 5

再次，舊線程，但這是我使用生成器和普通str.find的解決方案。

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

例子

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

返回

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

Answer 6

您可以使用re.finditer()進行非重疊匹配。

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

但不適用於：

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

Answer 7

來吧，讓我們一起遞歸。

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

這種方式不需要正則表達式。

Answer 8

如果您只是在尋找一個字符，這將起作用：

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

還，

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

我的直覺是這些（尤其是＃2）都不是非常出色的。

Answer 9

這是一個舊線程，但我很感興趣並想分享我的解決方案。

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

它應該返回找到子字符串的位置列表。 如果您發現錯誤或改進空間，請發表評論。

Answer 10

這對我來說是使用 re.finditer 的訣竅

import re

text = 'This is sample text to test if this pythonic '\
       'program can serve as an indexing platform for '\
       'finding words in a paragraph. It can give '\
       'values as to where the word is located with the '\
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string \'{}\''.
          format(match.start(), match.end(), match.group()))

Answer 11

這個線程有點舊，但這對我有用：

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

Answer 12

你可以試試：

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

Answer 13

其他人提供的任何解決方案都完全基於可用的方法 find() 或任何可用的方法。

查找字符串中所有出現的子字符串的核心基本算法是什么？

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

您也可以將 str 類繼承到新類，並可以在下面使用此功能。

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

調用方法

newstr.find_all('你覺得這個答案有幫助嗎？那就點贊吧！','this')

Answer 14

在文檔中查找大量關鍵詞時，使用flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

Flashtext 在大量搜索詞上的運行速度比正則表達式快。

Answer 15

此函數不會查看字符串中的所有位置，它不會浪費計算資源。 我的嘗試：

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

使用它這樣稱呼它：

result=findAll('this word is a big word man how many words are there?','word')

Answer 16

src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
    res.append(pos)
    pos = src.find(sub, pos + 1)

Answer 17

你可以試試：

import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]

Answer 18

我認為最干凈的解決方案是沒有庫和產量：

def find_all_occurrences(string, sub):
    index_of_occurrences = []
    current_index = 0
    while True:
        current_index = string.find(sub, current_index)
        if current_index == -1:
            return index_of_occurrences
        else:
            index_of_occurrences.append(current_index)
            current_index += len(sub)

find_all_occurrences(string, substr)

注意： find()方法在找不到任何東西時返回-1

Answer 19

pythonic的方式是：

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>>

Answer 20

這是來自hackerrank的類似問題的解決方案。 我希望這可以幫助你。

import re
a = input()
b = input()
if b not in a:
    print((-1,-1))
else:
    #create two list as
    start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
    for i in range(len(start_indc)):
        print((start_indc[i], start_indc[i]+len(b)-1))

輸出：

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

Answer 21

如果你只想使用 numpy 這里是一個解決方案

import numpy as np

S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)

Answer 22

def find_index(string, let):
    enumerated = [place  for place, letter in enumerate(string) if letter == let]
    return enumerated

例如：

find_index("hey doode find d", "d")

返回：

[4, 7, 13, 15]

Answer 23

不完全是 OP 的要求，但您也可以使用split 函數來獲取所有子字符串未出現的列表。 OP 沒有指定代碼的最終目標，但如果您的目標是無論如何刪除子字符串，那么這可能是一個簡單的單行。 使用更大的字符串可能有更有效的方法。 在這種情況下，正則表達式會更好

# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']

# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'

是否簡要瀏覽了其他答案，如果這已經在那里，我們深表歉意。

Answer 24

def count_substring(string, sub_string):
    c=0
    for i in range(0,len(string)-2):
        if string[i:i+len(sub_string)] == sub_string:
            c+=1
    return c

if __name__ == '__main__':
    string = input().strip()
    sub_string = input().strip()
    
    count = count_substring(string, sub_string)
    print(count)

Answer 25

我遇到了同樣的問題並這樣做了：

hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []

while True:
    o = hw.find('o')
    if o != -1:
        o_in_hw.append(o)
        list_hw[o] = ' '
        hw = ''.join(list_hw)
    else:
        print(o_in_hw)
        break

我在編碼方面很新，所以你可以簡化它（如果計划連續使用，當然讓它成為一個功能）。

一切都按照我正在做的事情進行。

編輯：請考慮這僅適用於單個字符，它會改變你的變量，所以你必須在一個新變量中創建一個字符串的副本來保存它，我沒有把它放在代碼中，因為它很容易而且它只是展示我是如何讓它工作的。

Answer 26

如果你想在沒有 re(regex) 的情況下使用，那么：

find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]

string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]

Answer 27

def find_index(word, letter):
    index_list=[]
    for i in range(len(word)) :
        if word[i]==letter:
            index_list.append(i)
    return index_list

index_of_e=find_index('Getacher','e')
print(index_of_e)  # will Give  [1, 6]```

Answer 28

這是我提出的一個解決方案，使用賦值表達式（自 Python 3.8 以來的新功能）：

string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]

輸出：

[0, 5, 10, 15]

Answer 29

查找給定字符串中出現的所有字符並作為字典返回，例如：hello result : {'h':1, 'e':1, 'l':2, 'o':1}

def count(string):
   result = {}
   if(string):
     for i in string:
       result[i] = string.count(i)
     return result
   return {}

否則你會喜歡這樣

from collections import Counter

   def count(string):
      return Counter(string)

Answer 30

試試這個它對我有用！

x=input('enter the string')
y=input('enter the substring')
z,r=x.find(y),x.rfind(y)
while z!=r:
        print(z,r,end=' ')
        z=z+len(y)
        r=r-len(y)
        z,r=x.find(y,z,r),x.rfind(y,z,r)

Answer 31

請看下面的代碼

#!/usr/bin/env python
# coding:utf-8
'''黃哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

Answer 32

通過切片，我們找到所有可能的組合並將它們附加到一個列表中，並使用count函數找到它出現的次數

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
    for j in range(1,n+1):
        l.append(s[i:j])
if f in l:
    print(l.count(f))

Answer 33

您可以輕松使用：

string.count('test')!

https://www.programiz.com/python-programming/methods/string/count

干杯!

如何查找所有出現的子字符串？

問題描述

32 個解決方案

解決方案1
705 已采納 2011-01-12 02:43:23

解決方案2
158 2011-01-12 03:13:28

解決方案3
74 2016-02-03 19:01:18

解決方案4
69 2011-01-12 02:48:05

解決方案5
55 2015-12-23 23:09:02

例子

解決方案6
25 2011-01-12 02:55:19

解決方案7
20 2013-11-01 03:16:00

解決方案8
13 2014-09-24 21:12:28

解決方案9
10 2015-04-01 09:23:24

解決方案10
9 2018-07-06 09:34:35

解決方案11
6 2014-09-01 12:48:11

解決方案12
6 2018-02-27 06:44:02

解決方案13
2 2018-02-15 20:02:23

解決方案14
2 2018-09-28 17:29:11

解決方案15
2 2020-01-13 12:39:43

解決方案16
2 2020-05-16 17:05:55

解決方案17
2 2021-10-25 10:13:42

解決方案18
2 2022-10-13 20:06:12

解決方案19
1 2018-04-10 19:40:59

解決方案20
1 2020-01-20 22:47:51

解決方案21
1 2021-06-10 16:46:44

解決方案22
0 2020-11-08 13:49:48

解決方案23
0 2021-05-19 13:43:55

解決方案24
0 2021-06-02 03:24:06

解決方案25
0 2021-06-25 20:18:14

解決方案26
0 2021-11-05 08:38:13

解決方案27
0 2021-12-01 21:51:45

解決方案28
0 2022-04-08 10:06:17

解決方案29
0 2022-04-30 08:00:02

解決方案30
0 2022-06-09 13:17:37

解決方案31
-1 2017-03-16 01:14:15

解決方案32
-1 2019-07-30 11:44:03

解決方案33
-3 2018-12-01 19:09:13

如何查找所有出現的子字符串？

問題描述

32 個解決方案

解決方案1 705 已采納 2011-01-12 02:43:23

解決方案2 158 2011-01-12 03:13:28

解決方案3 74 2016-02-03 19:01:18

解決方案4 69 2011-01-12 02:48:05

解決方案5 55 2015-12-23 23:09:02

例子

解決方案6 25 2011-01-12 02:55:19

解決方案7 20 2013-11-01 03:16:00

解決方案8 13 2014-09-24 21:12:28

解決方案9 10 2015-04-01 09:23:24

解決方案10 9 2018-07-06 09:34:35

解決方案11 6 2014-09-01 12:48:11

解決方案12 6 2018-02-27 06:44:02

解決方案13 2 2018-02-15 20:02:23

解決方案14 2 2018-09-28 17:29:11

解決方案15 2 2020-01-13 12:39:43

解決方案16 2 2020-05-16 17:05:55

解決方案17 2 2021-10-25 10:13:42

解決方案18 2 2022-10-13 20:06:12

解決方案19 1 2018-04-10 19:40:59

解決方案20 1 2020-01-20 22:47:51

解決方案21 1 2021-06-10 16:46:44

解決方案22 0 2020-11-08 13:49:48

解決方案23 0 2021-05-19 13:43:55

解決方案24 0 2021-06-02 03:24:06

解決方案25 0 2021-06-25 20:18:14

解決方案26 0 2021-11-05 08:38:13

解決方案27 0 2021-12-01 21:51:45

解決方案28 0 2022-04-08 10:06:17

解決方案29 0 2022-04-30 08:00:02

解決方案30 0 2022-06-09 13:17:37

解決方案31 -1 2017-03-16 01:14:15

解決方案32 -1 2019-07-30 11:44:03

解決方案33 -3 2018-12-01 19:09:13

解決方案1
705 已采納 2011-01-12 02:43:23

解決方案2
158 2011-01-12 03:13:28

解決方案3
74 2016-02-03 19:01:18

解決方案4
69 2011-01-12 02:48:05

解決方案5
55 2015-12-23 23:09:02

解決方案6
25 2011-01-12 02:55:19

解決方案7
20 2013-11-01 03:16:00

解決方案8
13 2014-09-24 21:12:28

解決方案9
10 2015-04-01 09:23:24

解決方案10
9 2018-07-06 09:34:35

解決方案11
6 2014-09-01 12:48:11

解決方案12
6 2018-02-27 06:44:02

解決方案13
2 2018-02-15 20:02:23

解決方案14
2 2018-09-28 17:29:11

解決方案15
2 2020-01-13 12:39:43

解決方案16
2 2020-05-16 17:05:55

解決方案17
2 2021-10-25 10:13:42

解決方案18
2 2022-10-13 20:06:12

解決方案19
1 2018-04-10 19:40:59

解決方案20
1 2020-01-20 22:47:51

解決方案21
1 2021-06-10 16:46:44

解決方案22
0 2020-11-08 13:49:48

解決方案23
0 2021-05-19 13:43:55

解決方案24
0 2021-06-02 03:24:06

解決方案25
0 2021-06-25 20:18:14

解決方案26
0 2021-11-05 08:38:13

解決方案27
0 2021-12-01 21:51:45

解決方案28
0 2022-04-08 10:06:17

解決方案29
0 2022-04-30 08:00:02

解決方案30
0 2022-06-09 13:17:37

解決方案31
-1 2017-03-16 01:14:15

解決方案32
-1 2019-07-30 11:44:03

解決方案33
-3 2018-12-01 19:09:13