从列表中获取唯一值作为Pandas python中的值

Question

I have a dataframe with more rows and columns but example with one is here: 我有一个包含更多行和列的数据框，但是这里有一个示例：

id    values 
1   [v1, v2, v1]

How to get unique values from list in pandas column? 如何从熊猫列中的列表中获取唯一值？ Desired output in second column v1, v2 I have tried with df['values'].unique() but obviously it's not working. 我已经尝试使用df ['values']。unique（）在第二列v1，v2中提供所需的输出，但显然它无法正常工作。

Answer 1

A simple solution would be agg pd.unique ie 一个简单的解决方案是agg pd.unique即

df = pd.DataFrame({'x' : [['v','w','x','v','x']]})

df['x'].agg(pd.unique) # Also np.unique

0    [v, w, x]
Name: x, dtype: object

or 要么

df['x'].agg(set).agg(list)

0    [v, w, x]
Name: x, dtype: object

Answer 2

Again 再次

df['new']=list(map(set,df['values'].values))

Timing 定时

%timeit df['values'].agg(np.unique)
The slowest run took 6.78 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 6.99 ms per loop
%timeit list(map(set,df['values'].values))
The slowest run took 55.36 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 228 µs per loop
%timeit df['values'].apply(lambda x: list(set(x)))
1000 loops, best of 3: 743 µs per loop

Answer 3

Try 尝试

df['values'] = df['values'].apply(lambda x: list(set(x)))


    id  values
0   1   [v2, v1]

Note: values is a pandas attribute so its better to avoid using that as column name. 注意：values是pandas属性，因此最好避免将其用作列名。

Time comparison: 时间比较：

df= pd.DataFrame({'id':[1]*1000,    'values' :[['v1', 'v2', 'v1']]*1000})
%timeit df['values'].agg(np.unique)

34.7 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


%timeit df['values'].apply(lambda x: list(set(x)))

1.98 ms ± 259 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

从列表中获取唯一值作为Pandas python中的值

问题描述

3 个解决方案

解决方案1
2 2017-12-06 16:44:32

解决方案2
2 2017-12-06 17:24:49

解决方案3
1 2017-12-06 16:27:38

从列表中获取唯一值作为Pandas python中的值

问题描述

3 个解决方案

解决方案1 2 2017-12-06 16:44:32

解决方案2 2 2017-12-06 17:24:49

解决方案3 1 2017-12-06 16:27:38

解决方案1
2 2017-12-06 16:44:32

解决方案2
2 2017-12-06 17:24:49

解决方案3
1 2017-12-06 16:27:38