简体   繁体   English

在Pandas Dataframe列中对列表进行排序

[英]Sort lists in a Pandas Dataframe column

I have a Dataframe column which is a collection of lists 我有一个Dataframe列,它是一个列表集合

    a
['a', 'b']
['b', 'a']
['a', 'c']
['c', 'a']

I would like to use this list to group by its unique values (['a', 'b'] & ['a', 'c']). 我想使用此列表按其唯一值(['a','b']和['a','c'])进行分组。 However, this generates an error 但是,这会产生错误

TypeError: unhashable type: 'list'

Is there any way around this. 有没有办法解决。 Ideally I would like to sort the values in place and create an additional column of a concatenated string. 理想情况下,我想对值进行排序,并创建一个连接字符串的附加列。

You can also sort values by column. 您还可以按列对值进行排序。

Example: 例:

x = [['a', 'b'], ['b', 'a'], ['a', 'c'], ['c', 'a']]
df = pandas.DataFrame({'a': Series(x)})
df.a.sort_values()

     a
0   [a, b]
2   [a, c]
1   [b, a]
3   [c, a]

However, for what I understand, you want to sort [b, a] to [a, b] , and [c, a] to [a, c] and then set values in order to get only [a, b][a, c] . 但是,根据我的理解,你想要将[b, a][a, b][c, a][a, c] ,然后set值以便只获得[a, b][a, c]

i'd recommend use lambda 我建议使用lambda

Try: 尝试:

result = df.a.sort_values().apply(lambda x: sorted(x))
result = DataFrame(result).reset_index(drop=True)

It returns: 它返回:

0    [a, b]
1    [a, c]
2    [a, b]
3    [a, c]

Then get unique values: 然后获得唯一值:

newdf = pandas.DataFrame({'a': Series(list(set(result['a'].apply(tuple))))})
newdf.sort_values(by='a')

     a
0   (a, b)
1   (a, c)

list are unhashable. 列表是不可用的。 however, tuples are hashable 但是,元组是可以清洗的

use 使用

df.groupby([df.a.apply(tuple)])

setup 建立
df = pd.DataFrame(dict(a=[list('ab'), list('ba'), list('ac'), list('ca')]))
results 结果
df.groupby([df.a.apply(tuple)]).size()

a
(a, b)    1
(a, c)    1
(b, a)    1
(c, a)    1
dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM