简体   繁体   English

Python:从列表内部提取列表并删除重复项

[英]Python: Extract list from inside the list and remove duplicates

I have a dataframe with a column that consists of lists of lists (of varying length). 我有一个带有一列的数据框,该列由列表的列表(长度可变)组成。 One example: df['east'][0] gives 一个例子: df['east'][0]给出

[array(['Indonesia', 'New Zealand'], dtype=object), array(['Indonesia', 'New Zealand'], dtype=object)]

I want to merge the lists inside of this bigger list and get rid of duplicates and make sure that data is presented clearly, ie simply 我想将列表合并到这个更大的列表中,并消除重复项,并确保清楚地显示数据,即简单地

['Indonesia', 'New Zealand']

I tried some suggestions from here to remove duplicates, but, for example,for np.unique(functools.reduce(operator.add, east)) Python said "ValueError: operands could not be broadcast together with shapes (4,) (13,)" 从这里尝试了一些建议来删除重复项,但是,例如,对于np.unique(functools.reduce(operator.add, east)) Python说“ ValueError:操作数不能与形状(4,)一起广播(13 ,)”

I could usually solve problems, but here I am not sure what is happening - what are these arrays in the list. 我通常可以解决问题,但是在这里我不确定正在发生什么-列表中的这些数组是什么。

One simple approach would be to flatten your lists/arrays with a comprehension and then use list(set()) to get unique values in a list: 一种简单的方法是使用理解力将列表/数组弄平,然后使用list(set())获得列表中的唯一值:

df['east'].apply(lambda x: list(set(item for sublist in x for item in sublist)))
# example output: ['New Zealand', 'Indonesia']

you can use the following one liner to achieve your results. 您可以使用以下一种衬纸来实现您的结果。

df['east'].apply(lambda value: reduce(lambda a, x: list(set(list(a) + list(x))), value, []))

lets break it down... 让我们分解一下...

list(a) + list(x) = avoids shape error and adds to lists to return one list (you can use addition of np arrays directly if you keep the shapes same) list(a) + list(x) =避免形状错误并添加到列表以返回一个列表(如果保持形状相同,则可以直接使用np数组的加法)

list(set(list(a) + list(x))) = array of all unique elements by first taking their set. list(set(list(a) + list(x))) =所有唯一元素的数组,方法是先获取它们的集合。

reduce(lambda a, x: list(set(list(a) + list(x))), value, [])) = recursively adds accumulator and the variable list to reduce it into one single list. reduce(lambda a, x: list(set(list(a) + list(x))), value, [])) =递归添加累加器和变量列表以将其简化为一个列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM