[英]How to find all occurrences of a substring?
Python has string.find()
and string.rfind()
to get the index of a substring in a string. Python 有
string.find()
和string.rfind()
来获取字符串中子字符串的索引。
I'm wondering whether there is something like string.find_all()
which can return all found indexes (not only the first from the beginning or the first from the end).我想知道是否有类似
string.find_all()
的东西可以返回所有找到的索引(不仅是从头开始的第一个或从结尾开始的第一个)。
For example:例如:
string = "test test test test"
print string.find('test') # 0
print string.rfind('test') # 15
#this is the goal
print string.find_all('test') # [0,5,10,15]
For counting the occurrences, see Count number of occurrences of a substring in a string .要计算出现次数,请参阅计算字符串中子字符串的出现次数。
There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions :没有简单的内置字符串函数可以满足您的需求,但您可以使用更强大的正则表达式:
import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]
If you want to find overlapping matches, lookahead will do that:如果你想找到重叠的匹配, lookahead会这样做:
[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]
If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:如果你想要一个没有重叠的反向查找,你可以将正负前瞻组合成这样的表达式:
search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]
re.finditer
returns a generator , so you could change the []
in the above to ()
to get a generator instead of a list which will be more efficient if you're only iterating through the results once. re.finditer
返回一个生成器,因此您可以将上面的[]
更改为()
以获取生成器而不是列表,如果您只遍历结果一次,这将更有效。
>>> help(str.find)
Help on method_descriptor:
find(...)
S.find(sub [,start [,end]]) -> int
Thus, we can build it ourselves:因此,我们可以自己构建它:
def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]
No temporary strings or regexes required.不需要临时字符串或正则表达式。
Use re.finditer
:使用
re.finditer
:
import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
print (match.start(), match.end())
For word = "this"
and sentence = "this is a sentence this this"
this will yield the output:对于
word = "this"
和sentence = "this is a sentence this this"
这将产生 output:
(0, 4)
(19, 23)
(24, 28)
Here's a (very inefficient) way to get all (ie even overlapping) matches:这是一种(非常低效)获取所有(即甚至重叠)匹配的方法:
>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]
Again, old thread, but here's my solution using a generator and plain str.find
.再次,旧线程,但这是我使用生成器和普通
str.find
的解决方案。
def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)
x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]
returns返回
[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]
You can use re.finditer()
for non-overlapping matches.您可以使用
re.finditer()
进行非重叠匹配。
>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]
but won't work for:但不适用于:
In [1]: aString="ababa"
In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]
Come, let us recurse together.来吧,让我们一起递归。
def locations_of_substring(string, substring):
"""Return a list of locations of a substring."""
substring_length = len(substring)
def recurse(locations_found, start):
location = string.find(substring, start)
if location != -1:
return recurse(locations_found + [location], location+substring_length)
else:
return locations_found
return recurse([], 0)
print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]
No need for regular expressions this way.这种方式不需要正则表达式。
If you're just looking for a single character, this would work:如果您只是在寻找一个字符,这将起作用:
string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7
Also,还,
string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4
My hunch is that neither of these (especially #2) is terribly performant.我的直觉是这些(尤其是#2)都不是非常出色的。
this is an old thread but i got interested and wanted to share my solution.这是一个旧线程,但我很感兴趣并想分享我的解决方案。
def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
else:
result.append(k)
k += 1 #change to k += len(sub) to not search overlapping results
return result
It should return a list of positions where the substring was found.它应该返回找到子字符串的位置列表。 Please comment if you see an error or room for improvment.
如果您发现错误或改进空间,请发表评论。
This does the trick for me using re.finditer这对我来说是使用 re.finditer 的诀窍
import re
text = 'This is sample text to test if this pythonic '\
'program can serve as an indexing platform for '\
'finding words in a paragraph. It can give '\
'values as to where the word is located with the '\
'different examples as stated'
# find all occurances of the word 'as' in the above text
find_the_word = re.finditer('as', text)
for match in find_the_word:
print('start {}, end {}, search string \'{}\''.
format(match.start(), match.end(), match.group()))
This thread is a little old but this worked for me:这个线程有点旧,但这对我有用:
numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"
marker = 0
while marker < len(numberString):
try:
print(numberString.index("five",marker))
marker = numberString.index("five", marker) + 1
except ValueError:
print("String not found")
marker = len(numberString)
You can try :你可以试试 :
>>> string = "test test test test"
>>> for index,value in enumerate(string):
if string[index:index+(len("test"))] == "test":
print index
0
5
10
15
Whatever the solutions provided by others are completely based on the available method find() or any available methods.其他人提供的任何解决方案都完全基于可用的方法 find() 或任何可用的方法。
What is the core basic algorithm to find all the occurrences of a substring in a string?
查找字符串中所有出现的子字符串的核心基本算法是什么?
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes
You can also inherit str class to new class and can use this function below.
您也可以将 str 类继承到新类,并可以在下面使用此功能。
class newstr(str):
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes
Calling the method调用方法
newstr.find_all('Do you find this answer helpful? then upvote this!','this')
newstr.find_all('你觉得这个答案有帮助吗?那就点赞吧!','this')
When looking for a large amount of key words in a document, use flashtext在文档中查找大量关键词时,使用flashtext
from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)
Flashtext runs faster than regex on large list of search words. Flashtext 在大量搜索词上的运行速度比正则表达式快。
This function does not look at all positions inside the string, it does not waste compute resources.此函数不会查看字符串中的所有位置,它不会浪费计算资源。 My try:
我的尝试:
def findAll(string,word):
all_positions=[]
next_pos=-1
while True:
next_pos=string.find(word,next_pos+1)
if(next_pos<0):
break
all_positions.append(next_pos)
return all_positions
to use it call it like this:使用它这样称呼它:
result=findAll('this word is a big word man how many words are there?','word')
src = input() # we will find substring in this string
sub = input() # substring
res = []
pos = src.find(sub)
while pos != -1:
res.append(pos)
pos = src.find(sub, pos + 1)
You can try :你可以试试 :
import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]
I think the most clean way of solution is without libraries and yields:我认为最干净的解决方案是没有库和产量:
def find_all_occurrences(string, sub):
index_of_occurrences = []
current_index = 0
while True:
current_index = string.find(sub, current_index)
if current_index == -1:
return index_of_occurrences
else:
index_of_occurrences.append(current_index)
current_index += len(sub)
find_all_occurrences(string, substr)
Note: find()
method returns -1
when it can't find anything注意:
find()
方法在找不到任何东西时返回-1
The pythonic way would be: pythonic的方式是:
mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]
# s represents the search string
# c represents the character string
find_all(mystring,'o') # will return all positions of 'o'
[4, 7, 20, 26]
>>>
This is solution of a similar question from hackerrank.这是来自hackerrank的类似问题的解决方案。 I hope this could help you.
我希望这可以帮助你。
import re
a = input()
b = input()
if b not in a:
print((-1,-1))
else:
#create two list as
start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
for i in range(len(start_indc)):
print((start_indc[i], start_indc[i]+len(b)-1))
Output:输出:
aaadaa
aa
(0, 1)
(1, 2)
(4, 5)
if you only want to use numpy here is a solution如果你只想使用 numpy 这里是一个解决方案
import numpy as np
S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)
def find_index(string, let):
enumerated = [place for place, letter in enumerate(string) if letter == let]
return enumerated
for example :例如 :
find_index("hey doode find d", "d")
returns:返回:
[4, 7, 13, 15]
Not exactly what OP asked but you could also use the split function to get a list of where all the substrings don't occur.不完全是 OP 的要求,但您也可以使用split 函数来获取所有子字符串未出现的列表。 OP didn't specify the end goal of the code but if your goal is to remove the substrings anyways then this could be a simple one-liner.
OP 没有指定代码的最终目标,但如果您的目标是无论如何删除子字符串,那么这可能是一个简单的单行。 There are probably more efficient ways to do this with larger strings;
使用更大的字符串可能有更有效的方法。 regular expressions would be preferable in that case
在这种情况下,正则表达式会更好
# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']
# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'
Did a brief skim of other answers so apologies if this is already up there.是否简要浏览了其他答案,如果这已经在那里,我们深表歉意。
def count_substring(string, sub_string):
c=0
for i in range(0,len(string)-2):
if string[i:i+len(sub_string)] == sub_string:
c+=1
return c
if __name__ == '__main__':
string = input().strip()
sub_string = input().strip()
count = count_substring(string, sub_string)
print(count)
I runned in the same problem and did this:我遇到了同样的问题并这样做了:
hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []
while True:
o = hw.find('o')
if o != -1:
o_in_hw.append(o)
list_hw[o] = ' '
hw = ''.join(list_hw)
else:
print(o_in_hw)
break
Im pretty new at coding so you can probably simplify it (and if planned to used continuously of course make it a function).我在编码方面很新,所以你可以简化它(如果计划连续使用,当然让它成为一个功能)。
All and all it works as intended for what i was doing.一切都按照我正在做的事情进行。
Edit: Please consider this is for single characters only, and it will change your variable, so you have to create a copy of the string in a new variable to save it, i didnt put it in the code cause its easy and its only to show how i made it work.编辑:请考虑这仅适用于单个字符,它会改变你的变量,所以你必须在一个新变量中创建一个字符串的副本来保存它,我没有把它放在代码中,因为它很容易而且它只是展示我是如何让它工作的。
if you want to use without re(regex) then:如果你想在没有 re(regex) 的情况下使用,那么:
find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]
string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]
def find_index(word, letter):
index_list=[]
for i in range(len(word)) :
if word[i]==letter:
index_list.append(i)
return index_list
index_of_e=find_index('Getacher','e')
print(index_of_e) # will Give [1, 6]```
Here's a solution that I came up with, using assignment expression (new feature since Python 3.8):这是我提出的一个解决方案,使用赋值表达式(自 Python 3.8 以来的新功能):
string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]
Output:输出:
[0, 5, 10, 15]
To find all the occurence of a character in a give string and return as a dictionary eg: hello result : {'h':1, 'e':1, 'l':2, 'o':1}查找给定字符串中出现的所有字符并作为字典返回,例如:hello result : {'h':1, 'e':1, 'l':2, 'o':1}
def count(string):
result = {}
if(string):
for i in string:
result[i] = string.count(i)
return result
return {}
or else you do like this否则你会喜欢这样
from collections import Counter
def count(string):
return Counter(string)
Try this it worked for me !试试这个它对我有用!
x=input('enter the string')
y=input('enter the substring')
z,r=x.find(y),x.rfind(y)
while z!=r:
print(z,r,end=' ')
z=z+len(y)
r=r-len(y)
z,r=x.find(y,z,r),x.rfind(y,z,r)
please look at below code请看下面的代码
#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''
def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result
if __name__ == '__main__':
text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)
By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count
function通过切片,我们找到所有可能的组合并将它们附加到一个列表中,并使用
count
函数找到它出现的次数
s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
for j in range(1,n+1):
l.append(s[i:j])
if f in l:
print(l.count(f))
You can easily use:您可以轻松使用:
string.count('test')!
https://www.programiz.com/python-programming/methods/string/count https://www.programiz.com/python-programming/methods/string/count
Cheers!干杯!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.