如何查找所有出现的子字符串？

Question

Python has string.find() and string.rfind() to get the index of a substring in a string. Python 有string.find()和string.rfind()来获取字符串中子字符串的索引。

I'm wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).我想知道是否有类似string.find_all()的东西可以返回所有找到的索引（不仅是从头开始的第一个或从结尾开始的第一个）。

For example:例如：

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

_{For counting the occurrences, see Count number of occurrences of a substring in a string .}_{要计算出现次数，请参阅计算字符串中子字符串的出现次数。}

Answer 1

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions :没有简单的内置字符串函数可以满足您的需求，但您可以使用更强大的正则表达式：

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:如果你想找到重叠的匹配， lookahead会这样做：

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:如果你想要一个没有重叠的反向查找，你可以将正负前瞻组合成这样的表达式：

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator , so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once. re.finditer返回一个生成器，因此您可以将上面的[]更改为()以获取生成器而不是列表，如果您只遍历结果一次，这将更有效。

Answer 2

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:因此，我们可以自己构建它：

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.不需要临时字符串或正则表达式。

Answer 3

Use re.finditer :使用re.finditer ：

import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
    print (match.start(), match.end())

For word = "this" and sentence = "this is a sentence this this" this will yield the output:对于word = "this"和sentence = "this is a sentence this this"这将产生 output：

(0, 4)
(19, 23)
(24, 28)

Answer 4

Here's a (very inefficient) way to get all (ie even overlapping) matches:这是一种（非常低效）获取所有（即甚至重叠）匹配的方法：

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

Answer 5

Again, old thread, but here's my solution using a generator and plain str.find .再次，旧线程，但这是我使用生成器和普通str.find的解决方案。

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

Example例子

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

returns返回

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

Answer 6

You can use re.finditer() for non-overlapping matches.您可以使用re.finditer()进行非重叠匹配。

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

but won't work for:但不适用于：

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

Answer 7

Come, let us recurse together.来吧，让我们一起递归。

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

No need for regular expressions this way.这种方式不需要正则表达式。

Answer 8

If you're just looking for a single character, this would work:如果您只是在寻找一个字符，这将起作用：

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

Also,还，

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

My hunch is that neither of these (especially #2) is terribly performant.我的直觉是这些（尤其是＃2）都不是非常出色的。

Answer 9

this is an old thread but i got interested and wanted to share my solution.这是一个旧线程，但我很感兴趣并想分享我的解决方案。

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

It should return a list of positions where the substring was found.它应该返回找到子字符串的位置列表。 Please comment if you see an error or room for improvment.如果您发现错误或改进空间，请发表评论。

Answer 10

This does the trick for me using re.finditer这对我来说是使用 re.finditer 的诀窍

import re

text = 'This is sample text to test if this pythonic '\
       'program can serve as an indexing platform for '\
       'finding words in a paragraph. It can give '\
       'values as to where the word is located with the '\
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string \'{}\''.
          format(match.start(), match.end(), match.group()))

Answer 11

This thread is a little old but this worked for me:这个线程有点旧，但这对我有用：

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

Answer 12

You can try :你可以试试：

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

Answer 13

Whatever the solutions provided by others are completely based on the available method find() or any available methods.其他人提供的任何解决方案都完全基于可用的方法 find() 或任何可用的方法。

What is the core basic algorithm to find all the occurrences of a substring in a string?查找字符串中所有出现的子字符串的核心基本算法是什么？

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

You can also inherit str class to new class and can use this function below.您也可以将 str 类继承到新类，并可以在下面使用此功能。

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

Calling the method调用方法

newstr.find_all('Do you find this answer helpful? then upvote this!','this') newstr.find_all('你觉得这个答案有帮助吗？那就点赞吧！','this')

Answer 14

When looking for a large amount of key words in a document, use flashtext在文档中查找大量关键词时，使用flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

Flashtext runs faster than regex on large list of search words. Flashtext 在大量搜索词上的运行速度比正则表达式快。

Answer 15

This function does not look at all positions inside the string, it does not waste compute resources.此函数不会查看字符串中的所有位置，它不会浪费计算资源。 My try:我的尝试：

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

to use it call it like this:使用它这样称呼它：

result=findAll('this word is a big word man how many words are there?','word')

Answer 16

src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
    res.append(pos)
    pos = src.find(sub, pos + 1)

Answer 17

You can try :你可以试试：

import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]

Answer 18

I think the most clean way of solution is without libraries and yields:我认为最干净的解决方案是没有库和产量：

def find_all_occurrences(string, sub):
    index_of_occurrences = []
    current_index = 0
    while True:
        current_index = string.find(sub, current_index)
        if current_index == -1:
            return index_of_occurrences
        else:
            index_of_occurrences.append(current_index)
            current_index += len(sub)

find_all_occurrences(string, substr)

Note: find() method returns -1 when it can't find anything注意： find()方法在找不到任何东西时返回-1

Answer 19

The pythonic way would be: pythonic的方式是：

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>>

Answer 20

This is solution of a similar question from hackerrank.这是来自hackerrank的类似问题的解决方案。 I hope this could help you.我希望这可以帮助你。

import re
a = input()
b = input()
if b not in a:
    print((-1,-1))
else:
    #create two list as
    start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
    for i in range(len(start_indc)):
        print((start_indc[i], start_indc[i]+len(b)-1))

Output:输出：

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

Answer 21

if you only want to use numpy here is a solution如果你只想使用 numpy 这里是一个解决方案

import numpy as np

S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)

Answer 22

def find_index(string, let):
    enumerated = [place  for place, letter in enumerate(string) if letter == let]
    return enumerated

for example :例如：

find_index("hey doode find d", "d")

returns:返回：

[4, 7, 13, 15]

Answer 23

Not exactly what OP asked but you could also use the split function to get a list of where all the substrings don't occur.不完全是 OP 的要求，但您也可以使用split 函数来获取所有子字符串未出现的列表。 OP didn't specify the end goal of the code but if your goal is to remove the substrings anyways then this could be a simple one-liner. OP 没有指定代码的最终目标，但如果您的目标是无论如何删除子字符串，那么这可能是一个简单的单行。 There are probably more efficient ways to do this with larger strings;使用更大的字符串可能有更有效的方法。 regular expressions would be preferable in that case在这种情况下，正则表达式会更好

# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']

# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'

Did a brief skim of other answers so apologies if this is already up there.是否简要浏览了其他答案，如果这已经在那里，我们深表歉意。

Answer 24

def count_substring(string, sub_string):
    c=0
    for i in range(0,len(string)-2):
        if string[i:i+len(sub_string)] == sub_string:
            c+=1
    return c

if __name__ == '__main__':
    string = input().strip()
    sub_string = input().strip()
    
    count = count_substring(string, sub_string)
    print(count)

Answer 25

I runned in the same problem and did this:我遇到了同样的问题并这样做了：

hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []

while True:
    o = hw.find('o')
    if o != -1:
        o_in_hw.append(o)
        list_hw[o] = ' '
        hw = ''.join(list_hw)
    else:
        print(o_in_hw)
        break

Im pretty new at coding so you can probably simplify it (and if planned to used continuously of course make it a function).我在编码方面很新，所以你可以简化它（如果计划连续使用，当然让它成为一个功能）。

All and all it works as intended for what i was doing.一切都按照我正在做的事情进行。

Edit: Please consider this is for single characters only, and it will change your variable, so you have to create a copy of the string in a new variable to save it, i didnt put it in the code cause its easy and its only to show how i made it work.编辑：请考虑这仅适用于单个字符，它会改变你的变量，所以你必须在一个新变量中创建一个字符串的副本来保存它，我没有把它放在代码中，因为它很容易而且它只是展示我是如何让它工作的。

Answer 26

if you want to use without re(regex) then:如果你想在没有 re(regex) 的情况下使用，那么：

find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]

string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]

Answer 27

def find_index(word, letter):
    index_list=[]
    for i in range(len(word)) :
        if word[i]==letter:
            index_list.append(i)
    return index_list

index_of_e=find_index('Getacher','e')
print(index_of_e)  # will Give  [1, 6]```

Answer 28

Here's a solution that I came up with, using assignment expression (new feature since Python 3.8):这是我提出的一个解决方案，使用赋值表达式（自 Python 3.8 以来的新功能）：

string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]

Output:输出：

[0, 5, 10, 15]

Answer 29

To find all the occurence of a character in a give string and return as a dictionary eg: hello result : {'h':1, 'e':1, 'l':2, 'o':1}查找给定字符串中出现的所有字符并作为字典返回，例如：hello result : {'h':1, 'e':1, 'l':2, 'o':1}

def count(string):
   result = {}
   if(string):
     for i in string:
       result[i] = string.count(i)
     return result
   return {}

or else you do like this否则你会喜欢这样

from collections import Counter

   def count(string):
      return Counter(string)

Answer 30

Try this it worked for me !试试这个它对我有用！

x=input('enter the string')
y=input('enter the substring')
z,r=x.find(y),x.rfind(y)
while z!=r:
        print(z,r,end=' ')
        z=z+len(y)
        r=r-len(y)
        z,r=x.find(y,z,r),x.rfind(y,z,r)

Answer 31

please look at below code请看下面的代码

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

Answer 32

By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function通过切片，我们找到所有可能的组合并将它们附加到一个列表中，并使用count函数找到它出现的次数

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
    for j in range(1,n+1):
        l.append(s[i:j])
if f in l:
    print(l.count(f))

Answer 33

You can easily use:您可以轻松使用：

string.count('test')!

https://www.programiz.com/python-programming/methods/string/count https://www.programiz.com/python-programming/methods/string/count

Cheers!干杯!

如何查找所有出现的子字符串？

问题描述

32 个解决方案

解决方案1
705 已采纳 2011-01-12 02:43:23

解决方案2
158 2011-01-12 03:13:28

解决方案3
74 2016-02-03 19:01:18

解决方案4
69 2011-01-12 02:48:05

解决方案5
55 2015-12-23 23:09:02

Example例子

解决方案6
25 2011-01-12 02:55:19

解决方案7
20 2013-11-01 03:16:00

解决方案8
13 2014-09-24 21:12:28

解决方案9
10 2015-04-01 09:23:24

解决方案10
9 2018-07-06 09:34:35

解决方案11
6 2014-09-01 12:48:11

解决方案12
6 2018-02-27 06:44:02

解决方案13
2 2018-02-15 20:02:23

解决方案14
2 2018-09-28 17:29:11

解决方案15
2 2020-01-13 12:39:43

解决方案16
2 2020-05-16 17:05:55

解决方案17
2 2021-10-25 10:13:42

解决方案18
2 2022-10-13 20:06:12

解决方案19
1 2018-04-10 19:40:59

解决方案20
1 2020-01-20 22:47:51

解决方案21
1 2021-06-10 16:46:44

解决方案22
0 2020-11-08 13:49:48

解决方案23
0 2021-05-19 13:43:55

解决方案24
0 2021-06-02 03:24:06

解决方案25
0 2021-06-25 20:18:14

解决方案26
0 2021-11-05 08:38:13

解决方案27
0 2021-12-01 21:51:45

解决方案28
0 2022-04-08 10:06:17

解决方案29
0 2022-04-30 08:00:02

解决方案30
0 2022-06-09 13:17:37

解决方案31
-1 2017-03-16 01:14:15

解决方案32
-1 2019-07-30 11:44:03

解决方案33
-3 2018-12-01 19:09:13

如何查找所有出现的子字符串？

问题描述

32 个解决方案

解决方案1 705 已采纳 2011-01-12 02:43:23

解决方案2 158 2011-01-12 03:13:28

解决方案3 74 2016-02-03 19:01:18

解决方案4 69 2011-01-12 02:48:05

解决方案5 55 2015-12-23 23:09:02

Example例子

解决方案6 25 2011-01-12 02:55:19

解决方案7 20 2013-11-01 03:16:00

解决方案8 13 2014-09-24 21:12:28

解决方案9 10 2015-04-01 09:23:24

解决方案10 9 2018-07-06 09:34:35

解决方案11 6 2014-09-01 12:48:11

解决方案12 6 2018-02-27 06:44:02

解决方案13 2 2018-02-15 20:02:23

解决方案14 2 2018-09-28 17:29:11

解决方案15 2 2020-01-13 12:39:43

解决方案16 2 2020-05-16 17:05:55

解决方案17 2 2021-10-25 10:13:42

解决方案18 2 2022-10-13 20:06:12

解决方案19 1 2018-04-10 19:40:59

解决方案20 1 2020-01-20 22:47:51

解决方案21 1 2021-06-10 16:46:44

解决方案22 0 2020-11-08 13:49:48

解决方案23 0 2021-05-19 13:43:55

解决方案24 0 2021-06-02 03:24:06

解决方案25 0 2021-06-25 20:18:14

解决方案26 0 2021-11-05 08:38:13

解决方案27 0 2021-12-01 21:51:45

解决方案28 0 2022-04-08 10:06:17

解决方案29 0 2022-04-30 08:00:02

解决方案30 0 2022-06-09 13:17:37

解决方案31 -1 2017-03-16 01:14:15

解决方案32 -1 2019-07-30 11:44:03

解决方案33 -3 2018-12-01 19:09:13

解决方案1
705 已采纳 2011-01-12 02:43:23

解决方案2
158 2011-01-12 03:13:28

解决方案3
74 2016-02-03 19:01:18

解决方案4
69 2011-01-12 02:48:05

解决方案5
55 2015-12-23 23:09:02

解决方案6
25 2011-01-12 02:55:19

解决方案7
20 2013-11-01 03:16:00

解决方案8
13 2014-09-24 21:12:28

解决方案9
10 2015-04-01 09:23:24

解决方案10
9 2018-07-06 09:34:35

解决方案11
6 2014-09-01 12:48:11

解决方案12
6 2018-02-27 06:44:02

解决方案13
2 2018-02-15 20:02:23

解决方案14
2 2018-09-28 17:29:11

解决方案15
2 2020-01-13 12:39:43

解决方案16
2 2020-05-16 17:05:55

解决方案17
2 2021-10-25 10:13:42

解决方案18
2 2022-10-13 20:06:12

解决方案19
1 2018-04-10 19:40:59

解决方案20
1 2020-01-20 22:47:51

解决方案21
1 2021-06-10 16:46:44

解决方案22
0 2020-11-08 13:49:48

解决方案23
0 2021-05-19 13:43:55

解决方案24
0 2021-06-02 03:24:06

解决方案25
0 2021-06-25 20:18:14

解决方案26
0 2021-11-05 08:38:13

解决方案27
0 2021-12-01 21:51:45

解决方案28
0 2022-04-08 10:06:17

解决方案29
0 2022-04-30 08:00:02

解决方案30
0 2022-06-09 13:17:37

解决方案31
-1 2017-03-16 01:14:15

解决方案32
-1 2019-07-30 11:44:03

解决方案33
-3 2018-12-01 19:09:13