简体   繁体   中英

Fastest way to check if a string can be created with a list of characters in python

I need to check if a string can be created with a list of characters and return True or False.

I am using different solutions with list.count or collections.Counter.

I am also using this solution which I dont need to read through the list of characters:

def check(message, characters):
    try:
        [list(characters).remove(m) for m in message]
        return True
    except:
        return False

Is there a fastest way? for a very very very big list of characters. Counter and list count seems slower. Dont know if there is a fast pythonic way to do this.

Example:

message = "hello"
characters = "hheellooasdadsfgfdgfdhgfdlkgkfd"

check(message, characters) # this should return True or False
# characters can be a veeeeery long string

Duplicates matter so for example characters = "hheloo" would not work for message = "hello"

You could use collections.Counter() . Just build two counters and use the subtract() method to check if there're any negative counts:

>>> c1 = Counter(characters)
>>> c2 = Counter(message)
>>> c1.subtract(c2)
>>> all(v >= 0 for v in c1.values())
False

This should work in linear time.

This is not feasible in linear time, as the length of both strings matter and they need to be iterated for each character. Without having checked its actual implementation, I assume remove() is logarithmic.

def check(msg, chars):
    c = list(chars)  # Creates a copy
    try:
        for m in msg:
            c.remove(m)
    except ValueError:
        return False
    return True

if __name__ == '__main__':
    print(check('hello', 'ehlo'))
    print(check('hello', 'ehlol'))
    print(check('hello', 'ehloijin2oinscubnosinal'))

Here is another solution compared to eugene's solution and jbndlr's solution.

def test1(input_word, alphabet):
    alp_set = set(list(alphabet))
    in_set = set(list(input_word))
    return in_set.issubset(alp_set)

def test2(input_word, alphabet):
    c1 = collections.Counter(alphabet)
    c2 = collections.Counter(input_word)
    c1.subtract(c2)
    return all(v >= 0 for v in c1.values())

def check(msg, chars):
    c = list(chars)  # Creates a copy
    try:
        for m in msg:
            c.remove(m)
    except ValueError:
        return False
    return True

input_word = "hello"
alphabet = "hheellooasdadsfgfdgfdhgfdlkgkfd"


start_time = time.time()
for i in range(10000):
    test1(input_word,alphabet)
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()
for i in range(10000):
    test2(input_word,alphabet)
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()
   for i in range(10000):
       check(input_word,alphabet)
   print("--- %s seconds ---" % (time.time() - start_time))

>> --- 0.03100299835205078 seconds ---
>> --- 0.24402451515197754 seconds ---
>> --- 0.022002220153808594 seconds ---

⇒ jbndlr's solution is the fastest - for this test case.

Another testcase:

input_word = "hellohellohellohellohellohellohellohellohellohellohellohellohello"
alphabet =   

""

>> --- 0.21964788436889648 seconds ---
>> --- 0.518169641494751 seconds ---
>> --- 1.3148927688598633 seconds ---

⇒ test1 is fastest

There is maybe a faster way of doing this, apparently due to the cost of creating the all() generator ( Why is Python's 'all' function so slow? ) perhaps a for loop is faster, Expanding on @eugene y's answer:

from collections import Counter
import time

message = "hello"
characters = "hheeooasdadsfgfdgfdhgfdlkgkfd"

def check1(message,characters):
    c1 = Counter(characters)
    c2 = Counter(message)
    c1.subtract(c2)
    return all(v > -1 for v in c1.values())

def check2(message,characters):
    c1 = Counter(characters)
    c2 = Counter(message)
    c1.subtract(c2)
    for v in c1.values():
        if v < 0:
            return False
    return True

st = time.time()
for i in range(350000):
    check1(message,characters)
end = time.time()
print ("all(): "+str(end-st))

st = time.time()
for i in range(350000):
    check2(message,characters)
end = time.time()
print ("for loop: "+str(end-st))

results:

all(): 5.201688051223755
for loop: 4.864434719085693

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM