如何從 numpy.ndarray 中獲取所有唯一詞？

Question

我有以下 ndarray： X_train: [[<'title'>, <'description'>]]

array([['Boots new', 'Boots 46 size new'], ['iPhone 7 plus 128GB Red',
        '\xa0/\n/\n The price is only for Instagram subscribers'], ...],
      dtype=object)

我想獲取所有唯一單詞的列表。 我怎樣才能以最快的方式做到這一點？ 感謝您提供任何可能的幫助。

Answer 1

我不確定你是否關心標題和描述中的單詞，所以這兩者都需要，但可以輕松修改。

如果你想跟蹤獨特的東西，集合通常是一個很好的類型，因為它不允許你添加多個相同的元素。

此代碼將在所有標題和描述中建立一組獨特的單詞。 我添加了忽略列表，以防有您想要忽略的特殊單詞。 如果需要，可以使用正則表達式使其更加復雜。

import numpy as np

arr = np.array([['Boots new', 'Boots 46 size new'], ['iPhone 7 plus 128GB Red',
                '\xa0/\n/\n The price is only for Instagram subscribers']],
                dtype=object)

words = set()
ignore = ["/", "7"]
for title, description in arr:
    words.update(set(word for word in title.strip().split() if word not in ignore))
    words.update(set(word for word in description.strip().split() if word not in ignore))

print(words)

這打印

{'price', 'Boots', 'subscribers', 'size', '46', 'Instagram', '128GB', 'new', 'plus', 'iPhone', 'is', 'only', 'for', 'The', 'Red'}

Answer 2

我用你的例子作為數據。 但是無論您的數組尺寸如何，此代碼都將起作用。

data = np.array([['Boots new', 'Boots 46 size new'], 
                 ['iPhone 7 plus 128GB Red','\xa0/\n/\n The price is only for Instagram subscribers']])
split_data = np.char.split(data, sep =' ') 
all_words = np.sum(split_data)
unique_words = np.unique(all_words)

split_data將單詞存儲在列表中，因此簡單的列表總和將為您提供所有單詞。 稍后您可以使用np.unique function。

如何從 numpy.ndarray 中獲取所有唯一詞？

問題描述

2 個解決方案

解決方案1
1 已采納 2020-04-17 21:33:57

解決方案2
0 2020-04-17 21:46:01

如何從 numpy.ndarray 中獲取所有唯一詞？

問題描述

2 個解決方案

解決方案1 1 已采納 2020-04-17 21:33:57

解決方案2 0 2020-04-17 21:46:01

解決方案1
1 已采納 2020-04-17 21:33:57

解決方案2
0 2020-04-17 21:46:01