大型python列表中的文本搜索元素

Question

With a list that looks something like: 列表如下所示：

cell_lines = ["LN18_CENTRAL_NERVOUS_SYSTEM","769P_KIDNEY","786O_KIDNEY"]

With my dabbling in regular expressions, I can't figure out a compelling way to search individual strings in a list besides looping through each element and performing the search. 由于涉足正则表达式，除了循环遍历每个元素并执行搜索外，我无法找到一种引人注目的方法来搜索列表中的各个字符串。

How can I retrieve indices containing "KIDNEY" in an efficient way (since I have a list of length thousands)? 如何有效地检索包含“ KIDNEY”的索引（因为我有一个长度为数千的列表）？

Answer 1

Make a list comprehension : 进行列表理解：

[line for line in cell_lines if "KIDNEY" in line]

This is O(n) since we check every item in a list to contain KIDNEY . 这是O(n)因为我们检查列表中的每个项目以包含KIDNEY 。

If you would need to make similar queries like this often, you should probably think about reorganizing your data and have a dictionary grouped by categories like KIDNEY : 如果您经常需要进行类似的查询，则应该考虑重新组织数据，并按KIDNEY类别对字典进行分组：

{
    "KIDNEY": ["769P_KIDNEY","786O_KIDNEY"],
    "NERVOUS_SYSTEM": ["LN18_CENTRAL_NERVOUS_SYSTEM"]
}

In this case, every "by category" lookup would take "constant" time. 在这种情况下，每个“按类别”查找都将花费“恒定”时间。

Answer 2

You can use a set instead of a list since it performs lookups in constant time. 您可以使用set而不是list因为它会在恒定时间内执行查找。

from bisect import bisect_left
def bi_contains(lst, item):
    """ efficient `item in lst` for sorted lists """
    # if item is larger than the last its not in the list, but the bisect would 
    # find `len(lst)` as the index to insert, so check that first. Else, if the 
    # item is in the list then it has to be at index bisect_left(lst, item)
    return (item <= lst[-1]) and (lst[bisect_left(lst, item)] == item)

Slightly modifying the above code will give you pretty good efficiency. 稍微修改上面的代码将为您带来很好的效率。

Here's a list of the data structures available in Python along with the time complexities. 这是Python中可用的数据结构以及时间复杂度的列表。
https://wiki.python.org/moin/TimeComplexity https://wiki.python.org/moin/TimeComplexity

大型python列表中的文本搜索元素

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-07-22 16:06:47

解决方案2
1 2015-07-22 16:09:22

大型python列表中的文本搜索元素

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-07-22 16:06:47

解决方案2 1 2015-07-22 16:09:22

解决方案1
1 已采纳 2015-07-22 16:06:47

解决方案2
1 2015-07-22 16:09:22