[英]Get unique from list as values in Pandas python
我有一個包含更多行和列的數據框,但是這里有一個示例:
id values
1 [v1, v2, v1]
如何從熊貓列中的列表中獲取唯一值? 我已經嘗試使用df ['values']。unique()在第二列v1,v2中提供所需的輸出,但顯然它無法正常工作。
一個簡單的解決方案是agg pd.unique即
df = pd.DataFrame({'x' : [['v','w','x','v','x']]})
df['x'].agg(pd.unique) # Also np.unique
0 [v, w, x]
Name: x, dtype: object
要么
df['x'].agg(set).agg(list)
0 [v, w, x]
Name: x, dtype: object
再次
df['new']=list(map(set,df['values'].values))
定時
%timeit df['values'].agg(np.unique)
The slowest run took 6.78 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 6.99 ms per loop
%timeit list(map(set,df['values'].values))
The slowest run took 55.36 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 228 µs per loop
%timeit df['values'].apply(lambda x: list(set(x)))
1000 loops, best of 3: 743 µs per loop
嘗試
df['values'] = df['values'].apply(lambda x: list(set(x)))
id values
0 1 [v2, v1]
注意:values是pandas屬性,因此最好避免將其用作列名。
時間比較:
df= pd.DataFrame({'id':[1]*1000, 'values' :[['v1', 'v2', 'v1']]*1000})
%timeit df['values'].agg(np.unique)
34.7 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit df['values'].apply(lambda x: list(set(x)))
1.98 ms ± 259 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.