![](/img/trans.png)
[英]Pandas dataframe - duplicates in data but dups don't reside in same columns
[英]How to drop duplicates columns from a pandas dataframe, based on columns' values (columns don't have the same name)?
如果其中的值與其他列相同,我想刪除這些列。 從 DF,它應該產生 DF_new:
DF = pd.DataFrame(index=[1,2,3,4], columns = ['col1', 'col2','col3','col4','col5'])
x = np.random.uniform(size=4)
DF['col1'] = x
DF['col2'] = x+2
DF['col3'] = x
DF ['col4'] = x+2
DF['col5'] = [5,6,7,8]
display(DF)
DF_new = DF[['col1', 'col2', 'col5']]
display(DF_new)
請注意,列名不一樣,所以我不能使用:
DF_new = DF.loc[:,~DF.columns.duplicated()].copy()
,根據名稱刪除列。
您可以使用:
df = df.T.drop_duplicates().T
一步步:
df2 = df.T # T = transpose (convert rows to columns)
1 2 3 4
col1 0.67075 0.707864 0.206923 0.168023
col2 2.67075 2.707864 2.206923 2.168023
col3 0.67075 0.707864 0.206923 0.168023
col4 2.67075 2.707864 2.206923 2.168023
col5 5.00000 6.000000 7.000000 8.000000
#now we can use drop duplicates
df2=df2.drop_duplicates()
'''
1 2 3 4
col1 0.67075 0.707864 0.206923 0.168023
col2 2.67075 2.707864 2.206923 2.168023
col5 5.00000 6.000000 7.000000 8.000000
'''
#then use transpose again.
df2=df2.T
'''
col1 col2 col5
1 0.670750 2.670750 5.0
2 0.707864 2.707864 6.0
3 0.206923 2.206923 7.0
4 0.168023 2.168023 8.0
'''
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.