簡體   English   中英

字符串中子字符串的最小范圍

[英]Smallest range of substring in string

我有一個字符串S(單詞的索引從0開始)和一個子字符串Q。我希望在S中找到最小的范圍[L,R],其中包含Q中的所有單詞。Q中沒有重復的單詞。我該如何處理這個 ?

例如,

輸入: S :那只懶惰的棕色狐狸跳過另一只懶惰的狗吃了狐狸的食物的棕色狐狸Q :懶惰的棕色狗

輸出: [11,15]

我的代碼:

S = raw_input().strip().split(' ')
Q = raw_input().strip().split(' ')

count = [0 for x in range(len(Q))]
smallest_index = [0 for x in range(len(Q))]
largest_index = [0 for x in range(len(Q))]

for i in range(len(S)):
    for j in range(len(Q)):
        if S[i] == Q[j]:
            count[j] += 1
            if count[j] <= 1:
                smallest_index[j] = i
                largest_index[j] = i
            if count[j] > 1:
                largest_index[j] = i

largest_index.sort()
print "[%d," % largest_index[0],
print "%d]" % largest_index[len(Q)-1]

這段代碼效率不是很高,但是可以正常工作。 也許有人會想出一種比使用product更好的處理位置信息的方法。 同時,您可以使用此代碼來測試其他算法。

from itertools import product

def words_range(src, query):
    # Create a dict to store the word positions in src of each query word
    pos = {s: [] for s in query}
    for i, s in enumerate(src):
        if s in pos:
            pos[s].append(i)
    print(pos)

    # Find all the ranges that hold all the query word 
    ranges = ((min(t), max(t)) for t in product(*pos.values()))
    # Find the smallest range
    return min(ranges, key=lambda t:t[1] - t[0])

# Test

src = '''what about the lazy brown fox that jumped over the other
brown one which lazy dog ate the food of the fox'''.split()
for i, s in enumerate(src):
    print(i, s)

query = 'lazy brown dog'.split()
print(words_range(src, query))

query = 'the lazy brown fox'.split()
print(words_range(src, query))

輸出

0 what
1 about
2 the
3 lazy
4 brown
5 fox
6 that
7 jumped
8 over
9 the
10 other
11 brown
12 one
13 which
14 lazy
15 dog
16 ate
17 the
18 food
19 of
20 the
21 fox
{'lazy': [3, 14], 'brown': [4, 11], 'dog': [15]}
(11, 15)
{'the': [2, 9, 17, 20], 'lazy': [3, 14], 'brown': [4, 11], 'fox': [5, 21]}
(2, 5)

這是PM 2Ring解決方案的一種稍微有效的版本,用循環代替了對product的調用:

from itertools import product

def words_range(src, query):
    query = set(query)

    # Create a dict to store the word positions in src of each query word
    pos = {s: [] for s in query}
    for i, s in enumerate(src):
        if s in pos:
            pos[s].append(i)

    # Find all the ranges that hold all the query word 
    # We'll iterate over the input string and keep track of
    # where each word appeared last
    last_pos = {}
    ranges = []
    for i, word in enumerate(src):
        if word in query:
            last_pos[word] = i
            if len(last_pos) == len(query):
                ranges.append( (min(last_pos.values()), i) )

    # Find the smallest range
    return min(ranges, key=lambda t:t[1] - t[0])

它不是線性時間(由於循環中的min(last_pos.values()) ),但這是朝着正確方向邁出的一步。 可能有一種擺脫min調用的方法(我現在無法想到),這將使它線性化。

這是基於@PM 2Ring答案的另一種方法:

S ='what about the lazy brown fox that jumped over the other brown one which lazy dog ate the food of the fox'
Q ='lazy brown dog'

import itertools
track={}

for index,value in enumerate(S.split()):

    if value in Q:
        if value not in track:
            track[value]=[index]
        else:
            track[value].append(index)


combination = [(min(item),max(item)) for item in itertools.product(*track.values())]


result=min([(i[1]-i[0],(i[0],i[1])) for i in combination if set(Q.split()).issubset(S.split()[i[0]:i[1]+1])])
print(result[1])

輸出:

(11, 15)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM