[英]How can i find unique combinations of 2 columns, delete not unique combinations, keeping only first rows in pandas
我有一個包含2列的數據集。 並且有數據組合。 我想查找是否沒有唯一組合,並刪除它們,僅保留第一行。
所以這是一個數據集
dim_set = [ ('Customer group$Large', 'DEPARTMENT$Sales'),
('Customer group$Medium', 'DEPARTMENT$Sales'),
('Customer group$Small', 'DEPARTMENT$Sales'),
('DEPARTMENT$Sales', 'Customer group$Large'),
('DEPARTMENT$Sales', 'Customer group$Medium'),
('DEPARTMENT$Sales', 'Customer group$Small')
]
df = pd.DataFrame(dim_set, columns=['dim', 'linked_dim'])
df
預期的輸出應該是
我相信您需要對每一行進行排序並刪除重復項:
df = (pd.DataFrame(np.sort(df[['dim', 'linked_dim']], axis=1),
columns=['dim', 'linked_dim'])
.drop_duplicates())
print (df)
dim linked_dim
0 Customer group$Large DEPARTMENT$Sales
1 Customer group$Medium DEPARTMENT$Sales
2 Customer group$Small DEPARTMENT$Sales
我認為它將為您服務
import numpy as np
df = (pd.DataFrame(np.sort(df[['dim', 'linked_dim']]),columns=['dim','linked_dim']).drop_duplicates())
print (df)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.