如何從熊貓的多個列中創建值的排序列表？

Question

我有一個數據列，其中的A列和B列在排序時可以具有相同的值對。 我想對這些列進行重復數據刪除，因為我不在乎應用程序中的順序。

這是一個示例數據框：

import pandas as pd
df = pd.DataFrame({'col1':[1, 2, 3], 'col2':[2, 1, 4]})
print(df)

數據框如下所示：

index col1  col2 

0     1     2 

1     2     1 

2     3     4

我想要實現的是創建一個新列，該列將對每行的前兩個值進行排序，因此我將能夠基於此列對數據幀進行重復數據刪除。

key_column如下所示：

0   [1, 2]

1   [1, 2]

2   [3, 4]

然后，我將使用df.drop_duplicates（col3）

我有一個想法，我應該使用.apply或.map，也許還可以使用一些lambda函數，但是到目前為止我沒有嘗試過：

df.apply(lambda row: sorted([row[0], row[1]]), axis=1) # this sorts the column values in place but doesn't create a new column with a list
sorted([df['col1'], df['col2']]) # returns error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
df.map(sorted) # dataframe object has no attribute map
df[['col1', 'col2']].apply(lambda x:
    sorted([','.join(x.astype(int).astype(str))]), axis=1) # creates a list but is not sorted

謝謝您的幫助，我希望看到一個也得到解釋的解決方案-它為什么起作用。

Answer 1

選項1

使用df.apply並通過sorted ：

In [1234]: df['col3'] = df.apply(tuple, 1).apply(sorted).apply(tuple)

In [1235]: df.drop_duplicates('col3')
Out[1235]: 
   col1  col2    col3
0     1     2  (1, 2)
2     3     4  (3, 4)

選項2

呼叫np.sort上df.values ，然后將結果分配到新列。

In [1208]: df['col3'] = pd.Series([tuple(x) for x in np.sort(df.values, 1)]); df
Out[1208]: 
   col1  col2    col3
0     1     2  (1, 2)
1     2     1  (1, 2)
2     3     4  (3, 4)

In [1210]: df.drop_duplicates('col3')
Out[1210]: 
   col1  col2    col3
0     1     2  (1, 2)
2     3     4  (3, 4)

Answer 2

三個步驟：

df['x'] = df.apply(lambda x: tuple(sorted(x)), axis=1)
df = df.drop_duplicates('x')
del df['x']

如何從熊貓的多個列中創建值的排序列表？

問題描述

2 個解決方案

解決方案1
4 已采納 2017-08-25 11:36:16

解決方案2
2 2017-08-25 11:50:23

如何從熊貓的多個列中創建值的排序列表？

問題描述

2 個解決方案

解決方案1 4 已采納 2017-08-25 11:36:16

解決方案2 2 2017-08-25 11:50:23

解決方案1
4 已采納 2017-08-25 11:36:16

解決方案2
2 2017-08-25 11:50:23