[英]How to drop duplicates columns from a pandas dataframe, based on columns' values (columns don't have the same name)?
I want to drop columns if the values inside of them are the same as other columns.如果其中的值与其他列相同,我想删除这些列。 From DF, it should yields DF_new:
从 DF,它应该产生 DF_new:
DF = pd.DataFrame(index=[1,2,3,4], columns = ['col1', 'col2','col3','col4','col5'])
x = np.random.uniform(size=4)
DF['col1'] = x
DF['col2'] = x+2
DF['col3'] = x
DF ['col4'] = x+2
DF['col5'] = [5,6,7,8]
display(DF)
DF_new = DF[['col1', 'col2', 'col5']]
display(DF_new)
Simple example of what I can't manage to do:我无法做到的简单示例:
Note that the column names are not the same, so I can't use:请注意,列名不一样,所以我不能使用:
DF_new = DF.loc[:,~DF.columns.duplicated()].copy()
, which drop columns based on their names. ,根据名称删除列。
You can use:您可以使用:
df = df.T.drop_duplicates().T
Step by step:一步步:
df2 = df.T # T = transpose (convert rows to columns)
1 2 3 4
col1 0.67075 0.707864 0.206923 0.168023
col2 2.67075 2.707864 2.206923 2.168023
col3 0.67075 0.707864 0.206923 0.168023
col4 2.67075 2.707864 2.206923 2.168023
col5 5.00000 6.000000 7.000000 8.000000
#now we can use drop duplicates
df2=df2.drop_duplicates()
'''
1 2 3 4
col1 0.67075 0.707864 0.206923 0.168023
col2 2.67075 2.707864 2.206923 2.168023
col5 5.00000 6.000000 7.000000 8.000000
'''
#then use transpose again.
df2=df2.T
'''
col1 col2 col5
1 0.670750 2.670750 5.0
2 0.707864 2.707864 6.0
3 0.206923 2.206923 7.0
4 0.168023 2.168023 8.0
'''
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.