[英]Pythonic way to get index of items where two list intersect
Say I have two list: one is a string -- 'example' and another is the alphabet. 说我有两个列表:一个是字符串-'example',另一个是字母。 I'd like to find a more pythonic way where every position in the alphabet list each letter of the string list 'example' intersects and put these indices in a list.
我想找到一种更Python化的方式,其中字母列表中每个位置的字符串列表“ example”的每个字母都相交并将这些索引放在列表中。 IE
IE
etc... 等等...
So far I have: 到目前为止,我有:
import string
alphabet = list(string.ascii_lowercase)
key = list('example')
def convert(string, alphabet):
table_l = []
for char in string:
for letter in alphabet:
if letter == char:
table_l.append(alphabet.index(letter))
return table_l
convert(key, alphabet)
I've tried using set intersection, but the string 'key' can contain more than 1 of each letter, and I'm looking for indices, not which letters match. 我尝试使用集合交集,但是字符串'key'可以包含每个字母中的多个,而且我正在寻找索引,而不是匹配的字母。
So far, the best I've tried is: 到目前为止,我尝试过的最好的方法是:
for x in key:
listed.append(set(alphabet).intersection(x))
I've no clue how to append the keys of alphabet where the value intersects with each letter of key. 我不知道如何在值与每个键字母相交的地方附加字母键。
Thanks 谢谢
You want a mapping from letters to numbers, so use a mapping data-structure, eg a dict
: 你想从字母到数字的映射 ,所以使用映射数据结构,例如
dict
:
>>> alphamap = dict(zip(alphabet, range(len(alphabet)))
>>> alphamap
{'h': 7, 'e': 4, 'g': 6, 'n': 13, 'm': 12, 's': 18, 'x': 23, 'r': 17, 'o': 14, 'f': 5, 'a': 0, 'v': 21, 't': 19, 'd': 3, 'j': 9, 'l': 11, 'b': 1, 'u': 20, 'y': 24, 'q': 16, 'k': 10, 'c': 2, 'w': 22, 'p': 15, 'i': 8, 'z': 25}
>>> def convert(string, map_):
... return [map_[c] for c in string]
...
>>> convert('example', alphamap)
[4, 23, 0, 12, 15, 11, 4]
Note, your original approach could be simplified to: 注意,您的原始方法可以简化为:
>>> list(map(alphabet.index, 'example'))
[4, 23, 0, 12, 15, 11, 4]
However, using alphabet.index
is less efficient than using a mapping (since it has to do a linear search each time rather than a constant-time hash). 但是,使用
alphabet.index
比使用映射效率低(因为它每次都必须进行线性搜索而不是固定时间的哈希)。
Also, note I've iterated over strings directly, no need to put them into a list, strings are sequences just like list
objects. 另外,请注意,我直接遍历了字符串,不需要将它们放入列表中, 字符串就像
list
对象一样是序列 。 They can be iterated over, sliced, etc. However, they are immutable. 可以对其进行迭代,切片等。但是,它们是不可变的。
Finally, the above approach will fail if there isn't a corresponding value, ie a special, non-alphabetic character. 最后,如果没有相应的值(即特殊的非字母字符),上述方法将失败。
>>> convert("example!", alphamap)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in convert
File "<stdin>", line 2, in <listcomp>
KeyError: '!'
This may or may not be desirable. 这可能是理想的,也可能不是理想的。 Alternatively, you can approach this by using
.get
with a default-value, eg: 另外,您可以通过将
.get
与默认值一起使用来解决此问题,例如:
>>> def convert(string, map_, default=-1):
... return [map_.get(c, default) for c in string]
...
>>> convert("example!", alphamap)
[4, 23, 0, 12, 15, 11, 4, -1]
If it's all ascii, something like below should work - convert letter to numeric representation, then subtract 97 as that's 'a' in ascii 如果全部为ascii,则应执行以下操作-将字母转换为数字表示形式,然后减去97,因为这是ascii中的“ a”
a = ord(‘a’)
[ord(c)-a for c in ‘example’.lower()]
Somehow in the same spirit as Guy, what about counting in base 36 (and following DyZ's and mhawke's advices), 以与Guy相同的精神,以36为底数(并遵循DyZ和mhawke的建议),
>>> a = int('a', 36)
>>> [int(c, 36) - a for c in 'example']
[4, 23, 0, 12, 15, 11, 4]
string.ascii_lowercase
).
string.ascii_lowercase
播放以来,情况就是string.ascii_lowercase
)。
Use sets. 使用集。
overlapKeys = set(alphabet) & set(key)
listOfIndices = [alphabet.index(key) for key in overlapKeys]
Also, 也,
key = list('example')
is unneccessary. 是不必要的。 Strings are lists of characters.
字符串是字符列表。 Use
采用
key = 'example'
Your example seems a little off... wouldn't x
be 23, m
12, etc? 您的示例似乎有点不对……
x
不会是23, m
12等吗?
>>> s = 'example'
>>> [(c, string.ascii_lowercase.index(c)) for c in s] # as a list of tuples
[('e', 4), ('x', 23), ('a', 0), ('m', 12), ('p', 15), ('l', 11), ('e', 4)]
This would be a little inefficient for longer strings because the use of index()
effectively makes this an O(n**2) solution. 对于较长的字符串,这会有些效率低下,因为使用
index()
有效地使其成为O(n ** 2)解决方案。
A better way is to use a lookup dictionary to convert from a character to its index. 更好的方法是使用查找字典将字符转换为其索引。 Because a dict lookup is O(1) the resulting solution will be O(n), which is much better.
由于dict查找为O(1),因此得出的结果将是O(n),这要好得多。
# create a dict that maps characters to indices
indices = {c: index for index, c in enumerate(string.ascii_lowercase)}
# perform the conversion
>>> s = 'example'
>>> [(c, indices.get(c, -1)) for c in s]
[('e', 4), ('x', 23), ('a', 0), ('m', 12), ('p', 15), ('l', 11), ('e', 4)]
If you wanted just the indices: 如果只需要索引:
>>> [indices.get(c, -1) for c in s]
[4, 23, 0, 12, 15, 11, 4]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.