简体   繁体   English

在 python 中的列表列表中查找唯一单词

[英]Find unique words in a list of lists in python

I have a list of lists that I would like to iterate over using a for loop, and create a new list with only the unique words.我有一个列表列表,我想使用 for 循环对其进行迭代,并创建一个仅包含唯一单词的新列表。 This is similar to a question asked previously , but I could not get the solution to work for me for a list within a list这类似于之前提出的问题,但我无法为列表中的列表找到适用于我的解决方案

For example, the nested list is as follows:例如,嵌套列表如下:

ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']],

The desired output would be a single list:所需的 output 将是一个列表:

List_Unique = [['is','and','so','he','his','run']]


I have tried the following two variations of code, but the output of all of them is a list of repeats:我尝试了以下两种代码变体,但所有代码的 output 都是重复列表:

unique_redundant = [] 
for i in redundant_search:
    redundant_i = [j for j in i if not i in unique_redundant]
    unique_redundant.append(redundant_i)
unique_redundant


unique_redundant = [] 
for list in redundant_search:
    for j in list:
        redundant_j = [i for i in j if not i in unique_redundant]
    unique_redundant.append(length_j)
unique_redundant

Example output given for the above two (incorrect) variations为上述两个(不正确的)变体给出的示例 output

(I ran the code on my real set of data and it gave repeating lists within lists of the same pair of words, though this isn't the actual two words, just an example): (我在我的真实数据集上运行了代码,它在同一对单词的列表中给出了重复列表,尽管这不是实际的两个单词,只是一个例子):

List_Unique = [['is','and'],['is','and'],['is','and']]

First flatten the list with itertools.chain , then use set to return the unique elements and pass that into a list:首先使用itertools.chain展平列表,然后使用set返回唯一元素并将其传递到列表中:

from itertools import chain

if __name__ == '__main__':
    print([{list(chain(*list_of_lists))}])

Use itertools.chain to flatten the list and dict.fromkeys to keep the unique values in order:使用itertools.chain来展平列表,使用dict.fromkeys来保持唯一值的顺序:

ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]

from itertools import chain
List_Unique = [list(dict.fromkeys(chain.from_iterable(ListofList)))]

You could try this:你可以试试这个:

ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]
uniqueItems = []
for firstList in ListofList:
    for item in firstList:
        if item not in uniqueItems:
            uniqueItems.append(item)
print(uniqueItems)

It uses a nested for loop to access each item and check whether it is in uniqueItems .它使用嵌套for循环来访问每个项目并检查它是否在uniqueItems中。

Just index out nested list with the help of while and acquire all the values in new list while cnt<len(listoflist)只需在 while 的帮助下索引嵌套列表并获取新列表中的所有值while cnt<len(listoflist)

ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]
list_new=[]
cnt=0
while cnt<len(ListofList):
    for i in ListofList[cnt]:
        if i in list_new:
            continue
        else:
            list_new.append(i)
    cnt+=1
print(list_new)

OUTPUT OUTPUT

['is', 'and', 'so', 'he', 'his', 'run']
flat_list = [item for sublist in ListofList for item in sublist]

# use this if order should not change
List_Unique = []
for item in flat_list:
    if item not in List_Unique:
        List_Unique.append(item)


# use this if order is not an issue
# List_Unique = list(set(flat_list))
ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]
list_new=[]
cnt=0
while cnt<len(ListofList):
    for i in ListofList[cnt]:
        if i in list_new:
            continue
        else:
            list_new.append(i)
    cnt+=1
print(list(list_new))

O/P输出/输出

[['is','and','so','he','his','run']]

using basic set concept, set consists of unique elements使用基本集合概念,集合由独特的元素组成

lst = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]
new_list = []
for x in lst:
    for y in set(x):
        new_list.append(y)
print(list(set(new_list)))


['run', 'and', 'is', 'so', 'he', 'his']

I'd suggest using the set() classunion() in this way:我建议以这种方式使用set() classunion()

ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]

set().union(*ListofList)
# => {'run', 'and', 'so', 'is', 'his', 'he'}

Explanation解释

It works like the following:它的工作原理如下:

test_set = set().union([1])
print(test_set)
# => {1}

The asterisk operator before the list ( *ListofList ) unpacks the list:列表前的星号运算符 ( *ListofList ) 解包列表:

lst = [[1], [2], [3]]
print(lst) #=> [[1], [2], [3]]
print(*lst) #=> [1] [2] [3]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM