简体   繁体   English

Python - 有效地检查列表是否存在且元素是否存在于列表中

[英]Python - efficiently check a list exists AND element exists in list

I have a variable foo , which points to a string, "bar"我有一个变量foo ,它指向一个字符串"bar"

foo = "bar"

I have a list, called whitelist .我有一个名为whitelist的列表。 If whitelist is not empty, the elements contained are a whitelist.如果whitelist不为空,则包含的元素是一个白名单。 If whitelist is empty, then the if statement permits any string.如果whitelist为空,则 if 语句允许任何字符串。

I have implemented this as follows我已经实现了如下

whitelist = ["bar", "baz", "x", "y"]

if whitelist and foo in whitelist:
    print("bar is whitelisted")
    # do something with whitelisted element

if whitelist , by my understanding, checks if whitelist returns True . if whitelist ,据我了解,检查whitelist是否返回True whitelist will be False if whitelist is empty.如果白名单为空,白名单将为False If whitelist contains elements, it will return True .如果whitelist包含元素,它将返回True

However, the real implementation of this contains:但是,它的真正实现包含:

  • lots of strings to check eg `"bar", "baz", "x", "y", "a", "b"很多字符串要检查,例如“bar”、“baz”、“x”、“y”、“a”、“b”
  • lots of whitelists to check against很多白名单要检查

Therefore, I was wondering if there is a more computationally efficient way of writing the if statement .因此,我想知道是否有一种计算效率更高的方法来编写 if 语句 It seems like checking the existence of whitelist each time is inefficient, and could be simplified.每次检查白名单是否存在似乎效率低下,可以简化。

These are some ways to check whether an element is in a list or not.这些是检查元素是否在列表中的一些方法。

from timeit import timeit
import numpy as np




whitelist1 = {"bar", "baz", "x", "y"}
whitelist2 = np.array(["bar", "baz", "x", "y"])

def func1():
    return {"foo"}.intersection(whitelist1)

def func2():
    return "foo" in whitelist1

def func3():
    return np.isin('foo',whitelist1)


def func4():
    return whitelist2[np.searchsorted(whitelist2, 'foo')] == 'foo'




print("func1=",timeit(func1,number=100000))
print("func2=",timeit(func2,number=100000))
print("func3",timeit(func3,number=100000))
print("func4=",timeit(func4,number=100000))

Time Taken by each function每个 function 所用时间

func1= 0.01365450001321733
func2= 0.005112499929964542
func3 0.5342871999600902
func4= 0.17057700001168996

FOr randomly generated list FOR 随机生成的列表

from timeit import timeit
import numpy as np
import random as rn
from string import ascii_letters


# randomLst = for a in range(500) rn.choices(ascii_letters,k=5)

randomLst = []
while len(randomLst) !=1000:
    radomWord = ''.join(rn.choices(ascii_letters,k=5))
    if radomWord not in randomLst:
        randomLst.append(radomWord)


whitelist1 = {*randomLst}
whitelist2 = np.array(randomLst)
randomWord = rn.choice(randomLst)
randomWords = set(rn.choices(randomLst, k=100))


def func1():
    return {randomWord}.intersection(whitelist1)

def func2():
    return randomWord in whitelist1

def func3():
    return np.isin('foo',whitelist1)


def func4():
    return whitelist2[np.searchsorted(whitelist2, randomWord)] == randomWord


def func5():
    return randomWords & whitelist1

print("func1=",timeit(func1,number=100000))
print("func2=",timeit(func2,number=100000))
print("func3",timeit(func3,number=100000))
print("func4=",timeit(func4,number=100000))
print("func5=",timeit(func5,number=1000)) # Here I change the number to 1000 because we check the 100 randoms word at one so number = 100000/100 = 1000.

Time taken用的时间

func1= 0.012835499946959317
func2= 0.005004600039683282
func3 0.5219665999757126
func4= 0.19900090002920479
func5= 0.0019264000002294779

Conclusion结论

  1. If you want to check only one word then 'in' statement is fast如果您只想检查一个单词,那么“in”语句很快

  2. But, if you have a list of word then '&' statement is fast 'func5'但是,如果你有一个单词列表,那么 '&' 语句是快速的 'func5'

Note: function 5 returns a set with the words that are in the whitelist注意:function 5 返回一个包含白名单中的单词的集合

whitelist would exist, but if it's possible None coerce with: whitelist会存在,但如果可能的话None强制:

whitelist = whitelist or []

As shared above then you can just foo in whitelist to figure out if it's in the list.如上所述,您只需foo in whitelist中即可确定它是否在列表中。 This is O(len(whitelist)) operation.这是O(len(whitelist))操作。 Arrays are surprisingly fast (say, for at least len(whitelist) >= 1,000 ) in practice. Arrays 在实践中速度惊人(例如,至少len(whitelist) >= 1,000 )。

If you need it to be faster use a set, and optionally if you need to do n lookup collect your foos into a set then use intersect for O(n) :如果您需要它更快地使用一个集合,并且如果您需要进行n查找,可以选择将您的 foos 收集到一个集合中,然后使用 intersect for O(n)

foos = { 'bar', 'none' }
whitelist = { 'bar' }
for foo in foos & whitelist:
   print(foo)

Here is the simplified solution, You can do that with two methods这是简化的解决方案,您可以使用两种方法来做到这一点

whitelist = ["bar", "baz", "x", "y"]
foo = "bar"
# method 1
def WhiteListExists(foo, whitelist):
    if whitelist and foo in whitelist:
        return True
    else:
        return False

exists = WhiteListExists(foo,whitelist)

# method 2
exists = True if whitelist and foo in whitelist else False

Both methods do the same but the second one is fast.两种方法都一样,但第二种方法很快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM