[英]Python: Extract list from inside the list and remove duplicates
I have a dataframe with a column that consists of lists of lists (of varying length). 我有一个带有一列的数据框,该列由列表的列表(长度可变)组成。 One example: df['east'][0]
gives 一个例子: df['east'][0]
给出
[array(['Indonesia', 'New Zealand'], dtype=object), array(['Indonesia', 'New Zealand'], dtype=object)]
I want to merge the lists inside of this bigger list and get rid of duplicates and make sure that data is presented clearly, ie simply 我想将列表合并到这个更大的列表中,并消除重复项,并确保清楚地显示数据,即简单地
['Indonesia', 'New Zealand']
I tried some suggestions from here to remove duplicates, but, for example,for np.unique(functools.reduce(operator.add, east))
Python said "ValueError: operands could not be broadcast together with shapes (4,) (13,)" 我从这里尝试了一些建议来删除重复项,但是,例如,对于np.unique(functools.reduce(operator.add, east))
Python说“ ValueError:操作数不能与形状(4,)一起广播(13 ,)”
I could usually solve problems, but here I am not sure what is happening - what are these arrays in the list. 我通常可以解决问题,但是在这里我不确定正在发生什么-列表中的这些数组是什么。
One simple approach would be to flatten your lists/arrays with a comprehension and then use list(set())
to get unique values in a list: 一种简单的方法是使用理解力将列表/数组弄平,然后使用list(set())
获得列表中的唯一值:
df['east'].apply(lambda x: list(set(item for sublist in x for item in sublist)))
# example output: ['New Zealand', 'Indonesia']
you can use the following one liner to achieve your results. 您可以使用以下一种衬纸来实现您的结果。
df['east'].apply(lambda value: reduce(lambda a, x: list(set(list(a) + list(x))), value, []))
lets break it down... 让我们分解一下...
list(a) + list(x)
= avoids shape error and adds to lists to return one list (you can use addition of np arrays directly if you keep the shapes same) list(a) + list(x)
=避免形状错误并添加到列表以返回一个列表(如果保持形状相同,则可以直接使用np数组的加法)
list(set(list(a) + list(x)))
= array of all unique elements by first taking their set. list(set(list(a) + list(x)))
=所有唯一元素的数组,方法是先获取它们的集合。
reduce(lambda a, x: list(set(list(a) + list(x))), value, []))
= recursively adds accumulator and the variable list to reduce it into one single list. reduce(lambda a, x: list(set(list(a) + list(x))), value, []))
=递归添加累加器和变量列表以将其简化为一个列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.