简体   繁体   English

确定一个字符串是否在其他两个字符串之间按字母顺序排列

[英]Determining whether a string is between two other strings alphabetically

I have 2 lists. 我有2个清单。 The first is just a list of strings. 第一个只是字符串列表。 The second is a list of tuples of strings. 第二个是字符串元组的列表。 Say I have string s from the first list. 假设我有s一个列表中的字符串。 I want to find all the pairs in the second list where s falls in between alphabetically. 我想在第二个列表中找到所有对,其中s在字母顺序之间。 A concrete example: 一个具体的例子:

s = "QZ123DEF"

("QZ123ABC", "QZ125ZEQ") # would return as a positive match
("QF12", "QY22") # would not return as a positive match

I thought of sort of brute force approach that would be to check if s is greater than the first string and less than a second for all tuples in the second list, but I wanted to know if there is a better way. 我想到了一种蛮力方法,即检查s是否大于第二个列表中的所有元组的第一个字符串,并且小于第二个字符串,但是我想知道是否有更好的方法。 By the way, I'm using python. 顺便说一句,我正在使用python。

Here's one way using the bisect module, this requires S to be sorted first: 这是使用bisect模块的一种方法,这要求先对S进行排序:

import bisect
import pprint
S = ['b', 'd', 'j', 'n', 's']
pairs = [('a', 'c'), ('a', 'e'), ('a', 'z')]

output = {}

for a, b in pairs:

    # Here `a_ind` and `b_ind` are the indices where `a` and `b` will fit in
    # the list `S`. Using these indices we can find the items from the list that will lie 
    # under `a` and `b`.

    a_ind = bisect.bisect_left(S, a)
    b_ind = bisect.bisect_right(S, b)

    for x in S[a_ind : b_ind]:
        output.setdefault(x, []).append((a, b))

pprint.pprint(output)

Output: 输出:

{'b': [('a', 'c'), ('a', 'e'), ('a', 'z')],
 'd': [('a', 'e'), ('a', 'z')],
 'j': [('a', 'z')],
 'n': [('a', 'z')],
 's': [('a', 'z')]}

On comparison with the brute force method on a random data this is 2-3 time faster: 与对随机数据的蛮力方法相比,这快了2-3倍:

def solve(S, pairs):

    S.sort()
    output = {}
    for a, b in pairs:
        a_ind = bisect.bisect_left(S, a)
        b_ind = bisect.bisect_right(S, b)
        for x in S[a_ind : b_ind]:
            output.setdefault(x, []).append((a, b))

def brute_force(S, pairs):

    output = {}
    for s in S:
        for a, b in pairs:
            if a <= s <= b:
                output.setdefault(s, []).append((a, b))

def get_word():
    return ''.join(random.choice(string.letters))

S = [get_word() for _ in xrange(10000)]
pairs = [sorted((get_word(), get_word())) for _ in xrange(1000)]

Timing comparison: 时序比较:

In [1]: %timeit brute_force(S, pairs)                                                                              
1 loops, best of 3: 10.2 s per loop                                                                                

In [2]: %timeit solve(S, pairs)                                                                                    
1 loops, best of 3: 3.94 s per loop                                                                                
def between((tupa,tupb),val):
    return tupa <= val <= tupb

s = "QZ123DEF"
print filter(lambda tup:between(tup,s),my_list_tuples)

maybe ... but its still "brute-force" 也许...但是它仍然是“蛮力”

So assuming there's only two entries in the tuple you can do a little comprehension: 因此,假设元组中只有两个条目,那么您可以做一点理解:

>>> s = "QZ123DEF"
>>> testList = [("QZ123ABC", "QZ125ZEQ"), ("QF12", "QY22")]
>>> [test[0] <= s <= test[1] for test in testList]
[True, False]

This can be expanded for a list of s 's with the results stored in a dict : 可以将其扩展为s的列表,并将结果存储在dict

>>> S = ["QZ123DEF", "QG42"]
>>> {s: [test[0] <= s <= test[1] for test in testList] for s in S}
{'QZ123DEF': [True, False], 'QG42': [False, True]}

I don't know whether it is a brute force or not but following code works: 我不知道这是否是蛮力,但以下代码有效:

def foo(s,a,b):
    if s<=a and s>=b:
        return True
    if s>=a and s<=b:
        return True
    return False


print foo("QZ123DEF", "QZ123ABC", "QZ125ZEQ") --> True
print foo("QZ123DEF", "QF12", "QY22") --> False

If the number of pairs is large and the number of searches is also considerable, the following algorithm may be advantageous. 如果对的数量很大并且搜索的数量也很大,则以下算法可能是有利的。 (I regret not having had the time for any comparisons yet.) (我很遗憾没有时间进行任何比较。)

This algorithm copies all strings from the second list to a table, where entries are: a) a string, and b) the index into the original list, but negative ("flagged") for each "second" strings Then, sort this table according to the string component from the second list. 该算法将第二个列表中的所有字符串复制到一个表中,其中的条目是:a)一个字符串,和b)到原始列表的索引,但是对每个“第二个”字符串都为负(“标记”)然后,对该表进行排序根据第二个列表中的字符串部分。

Then, for a string s from the second list, find the smallest entry in strpos whose string is greater or equal to s. 然后,对于第二个列表中的字符串s,在strpos中找到其字符串大于或等于s的最小条目。

Finally, collect all indices from that entry onward to the end of the table, remembering positive indices and skipping their negative counterparts. 最后,收集从该条目到表末尾的所有索引,记住正索引并跳过负索引。 This will give you all pairs enclosing string s. 这将为您提供所有包含字符串s的对。

Dump of a strpos table: 转储strpos表:

AAA at 1
BBB at 2
CCC at -1
FFF at -2
HHH at 3
LLL at -3
NNN at 4
ZZZ at -4

Results for three strings: 三个字符串的结果:

for ABC found AAA - CCC
for XYZ found NNN - ZZZ
for IJK found HHH - LLL
for HHH found HHH - LLL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM