如何迭代 Python 中的词汇表？

Question

Here's what I mean: Say I have a list.这就是我的意思：假设我有一个清单。 Let's call it messages.我们称之为消息。

messages = ['hey how are you', 'doing good what about you']

My end goal is to run this list against another list of vocabulary, and if each word is in the vocab list, put it in another.我的最终目标是针对另一个词汇表运行此列表，如果每个单词都在词汇表中，则将其放入另一个中。 This vocabulary list looks like this:这个词汇表看起来是这样的：

vocab = ['hey', 'how', 'you']

(Notice 'are' is omitted) （注意'are'被省略）
The final list of my formatted data right now looks like this:我的格式化数据的最终列表现在看起来像这样：

final_list = np.array([['', '', '', ''], ['', '', '', '']])

I want it to look something like this:我希望它看起来像这样：

final_list = np.array([['hey', 'how', 'you', ''], ['you', '', '', '']])

I have an idea using a for loop and enumerate() , but it's not working too well.我有一个使用for循环和enumerate()的想法，但效果不佳。 Help would be appreciated!帮助将不胜感激！

Answer 1

Go over the list of messages.查看消息列表。 For each message, split it into words, take at most N (N=4) words, and pad with empty strings, if needed.对于每条消息，将其拆分为单词，最多取 N (N=4) 个单词，并根据需要填充空字符串。

N = 4
data = []
for m in messages:
    words = [x for x in m.split() if x in vocab]
    data.append(words[:N] + (N - len(words)) * [""])
final_list = np.array(data)

For better performance, convert vocab to a set before the loop:为了获得更好的性能，在循环之前将vocab转换为一个集合：

vocab = set(vocab)

Result:结果：

array([['hey', 'how', 'you', ''],
       ['you', '', '', '']], dtype='<U3')

Answer 2

Try with two for loops:尝试使用两个 for 循环：

vocab = ['hey', 'how', 'you']
messages = ['hey how are you', 'doing good what about you']
m = []
s = []
for x in messages:
  for y in x.split():
    if y in vocab:
      s.append(y)
  m.append(s)
  s = []
    
print(m)

To get empty elements:获取空元素：

vocab = ['hey', 'how', 'you']
messages = ['hey how are you', 'doing good what about you']
m = []
s = []
for x in messages:
  for y in x.split():
    if y in vocab:
      s.append(y)
    else:
      s.append('')
  m.append(s)
  s = []
    
print(m)

Answer 3

List comprehension is an efficient way to do this.列表理解是一种有效的方法。 You can then convert the output into an array if need be.然后，如果需要，您可以将输出转换为数组。

li = ['hey how are you', 'doing good what about you']
vocab = ['hey', 'how', 'you']

final_list = [[el if el in el2 else '' for el in vocab] for el2 in li]

print(final_list)

Output:输出：

[['hey', 'how', 'you'], ['', '', 'you']]

如何迭代 Python 中的词汇表？

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-11-01 03:33:39

解决方案2
1 2020-11-01 02:50:23

解决方案3
0 2020-11-01 04:08:48

如何迭代 Python 中的词汇表？

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-11-01 03:33:39

解决方案2 1 2020-11-01 02:50:23

解决方案3 0 2020-11-01 04:08:48

解决方案1
2 已采纳 2020-11-01 03:33:39

解决方案2
1 2020-11-01 02:50:23

解决方案3
0 2020-11-01 04:08:48