![](/img/trans.png)
[英]How to find all occurrences of a non - contiguous substring in Python?
[英]How to find all occurrences of a substring?
Python 有string.find()
和string.rfind()
來獲取字符串中子字符串的索引。
我想知道是否有類似string.find_all()
的東西可以返回所有找到的索引(不僅是從頭開始的第一個或從結尾開始的第一個)。
例如:
string = "test test test test"
print string.find('test') # 0
print string.rfind('test') # 15
#this is the goal
print string.find_all('test') # [0,5,10,15]
要計算出現次數,請參閱計算字符串中子字符串的出現次數。
沒有簡單的內置字符串函數可以滿足您的需求,但您可以使用更強大的正則表達式:
import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]
如果你想找到重疊的匹配, lookahead會這樣做:
[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]
如果你想要一個沒有重疊的反向查找,你可以將正負前瞻組合成這樣的表達式:
search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]
re.finditer
返回一個生成器,因此您可以將上面的[]
更改為()
以獲取生成器而不是列表,如果您只遍歷結果一次,這將更有效。
>>> help(str.find)
Help on method_descriptor:
find(...)
S.find(sub [,start [,end]]) -> int
因此,我們可以自己構建它:
def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]
不需要臨時字符串或正則表達式。
使用re.finditer
:
import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
print (match.start(), match.end())
對於word = "this"
和sentence = "this is a sentence this this"
這將產生 output:
(0, 4)
(19, 23)
(24, 28)
這是一種(非常低效)獲取所有(即甚至重疊)匹配的方法:
>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]
再次,舊線程,但這是我使用生成器和普通str.find
的解決方案。
def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)
x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]
返回
[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]
您可以使用re.finditer()
進行非重疊匹配。
>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]
但不適用於:
In [1]: aString="ababa"
In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]
來吧,讓我們一起遞歸。
def locations_of_substring(string, substring):
"""Return a list of locations of a substring."""
substring_length = len(substring)
def recurse(locations_found, start):
location = string.find(substring, start)
if location != -1:
return recurse(locations_found + [location], location+substring_length)
else:
return locations_found
return recurse([], 0)
print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]
這種方式不需要正則表達式。
如果您只是在尋找一個字符,這將起作用:
string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7
還,
string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4
我的直覺是這些(尤其是#2)都不是非常出色的。
這是一個舊線程,但我很感興趣並想分享我的解決方案。
def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
else:
result.append(k)
k += 1 #change to k += len(sub) to not search overlapping results
return result
它應該返回找到子字符串的位置列表。 如果您發現錯誤或改進空間,請發表評論。
這對我來說是使用 re.finditer 的訣竅
import re
text = 'This is sample text to test if this pythonic '\
'program can serve as an indexing platform for '\
'finding words in a paragraph. It can give '\
'values as to where the word is located with the '\
'different examples as stated'
# find all occurances of the word 'as' in the above text
find_the_word = re.finditer('as', text)
for match in find_the_word:
print('start {}, end {}, search string \'{}\''.
format(match.start(), match.end(), match.group()))
這個線程有點舊,但這對我有用:
numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"
marker = 0
while marker < len(numberString):
try:
print(numberString.index("five",marker))
marker = numberString.index("five", marker) + 1
except ValueError:
print("String not found")
marker = len(numberString)
你可以試試 :
>>> string = "test test test test"
>>> for index,value in enumerate(string):
if string[index:index+(len("test"))] == "test":
print index
0
5
10
15
其他人提供的任何解決方案都完全基於可用的方法 find() 或任何可用的方法。
查找字符串中所有出現的子字符串的核心基本算法是什么?
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes
您也可以將 str 類繼承到新類,並可以在下面使用此功能。
class newstr(str):
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes
調用方法
newstr.find_all('你覺得這個答案有幫助嗎?那就點贊吧!','this')
在文檔中查找大量關鍵詞時,使用flashtext
from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)
Flashtext 在大量搜索詞上的運行速度比正則表達式快。
此函數不會查看字符串中的所有位置,它不會浪費計算資源。 我的嘗試:
def findAll(string,word):
all_positions=[]
next_pos=-1
while True:
next_pos=string.find(word,next_pos+1)
if(next_pos<0):
break
all_positions.append(next_pos)
return all_positions
使用它這樣稱呼它:
result=findAll('this word is a big word man how many words are there?','word')
src = input() # we will find substring in this string
sub = input() # substring
res = []
pos = src.find(sub)
while pos != -1:
res.append(pos)
pos = src.find(sub, pos + 1)
你可以試試 :
import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]
我認為最干凈的解決方案是沒有庫和產量:
def find_all_occurrences(string, sub):
index_of_occurrences = []
current_index = 0
while True:
current_index = string.find(sub, current_index)
if current_index == -1:
return index_of_occurrences
else:
index_of_occurrences.append(current_index)
current_index += len(sub)
find_all_occurrences(string, substr)
注意: find()
方法在找不到任何東西時返回-1
pythonic的方式是:
mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]
# s represents the search string
# c represents the character string
find_all(mystring,'o') # will return all positions of 'o'
[4, 7, 20, 26]
>>>
這是來自hackerrank的類似問題的解決方案。 我希望這可以幫助你。
import re
a = input()
b = input()
if b not in a:
print((-1,-1))
else:
#create two list as
start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
for i in range(len(start_indc)):
print((start_indc[i], start_indc[i]+len(b)-1))
輸出:
aaadaa
aa
(0, 1)
(1, 2)
(4, 5)
如果你只想使用 numpy 這里是一個解決方案
import numpy as np
S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)
def find_index(string, let):
enumerated = [place for place, letter in enumerate(string) if letter == let]
return enumerated
例如 :
find_index("hey doode find d", "d")
返回:
[4, 7, 13, 15]
不完全是 OP 的要求,但您也可以使用split 函數來獲取所有子字符串未出現的列表。 OP 沒有指定代碼的最終目標,但如果您的目標是無論如何刪除子字符串,那么這可能是一個簡單的單行。 使用更大的字符串可能有更有效的方法。 在這種情況下,正則表達式會更好
# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']
# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'
是否簡要瀏覽了其他答案,如果這已經在那里,我們深表歉意。
def count_substring(string, sub_string):
c=0
for i in range(0,len(string)-2):
if string[i:i+len(sub_string)] == sub_string:
c+=1
return c
if __name__ == '__main__':
string = input().strip()
sub_string = input().strip()
count = count_substring(string, sub_string)
print(count)
我遇到了同樣的問題並這樣做了:
hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []
while True:
o = hw.find('o')
if o != -1:
o_in_hw.append(o)
list_hw[o] = ' '
hw = ''.join(list_hw)
else:
print(o_in_hw)
break
我在編碼方面很新,所以你可以簡化它(如果計划連續使用,當然讓它成為一個功能)。
一切都按照我正在做的事情進行。
編輯:請考慮這僅適用於單個字符,它會改變你的變量,所以你必須在一個新變量中創建一個字符串的副本來保存它,我沒有把它放在代碼中,因為它很容易而且它只是展示我是如何讓它工作的。
如果你想在沒有 re(regex) 的情況下使用,那么:
find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]
string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]
def find_index(word, letter):
index_list=[]
for i in range(len(word)) :
if word[i]==letter:
index_list.append(i)
return index_list
index_of_e=find_index('Getacher','e')
print(index_of_e) # will Give [1, 6]```
這是我提出的一個解決方案,使用賦值表達式(自 Python 3.8 以來的新功能):
string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]
輸出:
[0, 5, 10, 15]
查找給定字符串中出現的所有字符並作為字典返回,例如:hello result : {'h':1, 'e':1, 'l':2, 'o':1}
def count(string):
result = {}
if(string):
for i in string:
result[i] = string.count(i)
return result
return {}
否則你會喜歡這樣
from collections import Counter
def count(string):
return Counter(string)
試試這個它對我有用!
x=input('enter the string')
y=input('enter the substring')
z,r=x.find(y),x.rfind(y)
while z!=r:
print(z,r,end=' ')
z=z+len(y)
r=r-len(y)
z,r=x.find(y,z,r),x.rfind(y,z,r)
請看下面的代碼
#!/usr/bin/env python
# coding:utf-8
'''黃哥Python'''
def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result
if __name__ == '__main__':
text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)
通過切片,我們找到所有可能的組合並將它們附加到一個列表中,並使用count
函數找到它出現的次數
s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
for j in range(1,n+1):
l.append(s[i:j])
if f in l:
print(l.count(f))
您可以輕松使用:
string.count('test')!
https://www.programiz.com/python-programming/methods/string/count
干杯!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.