如何根据列的值（列的名称不同）从 pandas dataframe 中删除重复的列？

Question

I want to drop columns if the values inside of them are the same as other columns.如果其中的值与其他列相同，我想删除这些列。 From DF, it should yields DF_new:从 DF，它应该产生 DF_new：

DF = pd.DataFrame(index=[1,2,3,4], columns = ['col1', 'col2','col3','col4','col5'])
x = np.random.uniform(size=4)
DF['col1'] = x
DF['col2'] = x+2
DF['col3'] = x
DF ['col4'] = x+2
DF['col5'] = [5,6,7,8]
display(DF)

DF_new = DF[['col1', 'col2', 'col5']]
display(DF_new)

Simple example of what I can't manage to do:我无法做到的简单示例：

Note that the column names are not the same, so I can't use:请注意，列名不一样，所以我不能使用：

DF_new = DF.loc[:,~DF.columns.duplicated()].copy()

, which drop columns based on their names. ，根据名称删除列。

Answer 1

You can use:您可以使用：

df = df.T.drop_duplicates().T

Step by step:一步步：

df2 = df.T # T = transpose (convert rows to columns)

            1         2         3         4
col1  0.67075  0.707864  0.206923  0.168023
col2  2.67075  2.707864  2.206923  2.168023
col3  0.67075  0.707864  0.206923  0.168023
col4  2.67075  2.707864  2.206923  2.168023
col5  5.00000  6.000000  7.000000  8.000000

#now we can use drop duplicates

df2=df2.drop_duplicates()
'''
            1         2         3         4
col1  0.67075  0.707864  0.206923  0.168023
col2  2.67075  2.707864  2.206923  2.168023
col5  5.00000  6.000000  7.000000  8.000000
'''

#then use transpose again.
df2=df2.T
'''
       col1      col2  col5
1  0.670750  2.670750   5.0
2  0.707864  2.707864   6.0
3  0.206923  2.206923   7.0
4  0.168023  2.168023   8.0
'''

Answer 2

this should do what you need这应该做你需要的

df = df.loc[:,~df.apply(lambda x: x.duplicated(),axis=1).all()].copy()

as you can see from this link你可以从这个链接看到

如何根据列的值（列的名称不同）从 pandas dataframe 中删除重复的列？

问题描述

2 个解决方案

解决方案1
2 2022-11-22 16:42:51

解决方案2
0 2022-11-22 15:42:51

如何根据列的值（列的名称不同）从 pandas dataframe 中删除重复的列？

问题描述

2 个解决方案

解决方案1 2 2022-11-22 16:42:51

解决方案2 0 2022-11-22 15:42:51

解决方案1
2 2022-11-22 16:42:51

解决方案2
0 2022-11-22 15:42:51