简体   繁体   English

如何根据列的值(列的名称不同)从 pandas dataframe 中删除重复的列?

[英]How to drop duplicates columns from a pandas dataframe, based on columns' values (columns don't have the same name)?

I want to drop columns if the values inside of them are the same as other columns.如果其中的值与其他列相同,我想删除这些列。 From DF, it should yields DF_new:从 DF,它应该产生 DF_new:

DF = pd.DataFrame(index=[1,2,3,4], columns = ['col1', 'col2','col3','col4','col5'])
x = np.random.uniform(size=4)
DF['col1'] = x
DF['col2'] = x+2
DF['col3'] = x
DF ['col4'] = x+2
DF['col5'] = [5,6,7,8]
display(DF)

DF_new = DF[['col1', 'col2', 'col5']]
display(DF_new)

Simple example of what I can't manage to do:我无法做到的简单示例:

Note that the column names are not the same, so I can't use:请注意,列名不一样,所以我不能使用:

DF_new = DF.loc[:,~DF.columns.duplicated()].copy()

, which drop columns based on their names. ,根据名称删除列。

You can use:您可以使用:

df = df.T.drop_duplicates().T

Step by step:一步步:

df2 = df.T # T = transpose (convert rows to columns)

            1         2         3         4
col1  0.67075  0.707864  0.206923  0.168023
col2  2.67075  2.707864  2.206923  2.168023
col3  0.67075  0.707864  0.206923  0.168023
col4  2.67075  2.707864  2.206923  2.168023
col5  5.00000  6.000000  7.000000  8.000000

#now we can use drop duplicates

df2=df2.drop_duplicates()
'''
            1         2         3         4
col1  0.67075  0.707864  0.206923  0.168023
col2  2.67075  2.707864  2.206923  2.168023
col5  5.00000  6.000000  7.000000  8.000000
'''

#then use transpose again.
df2=df2.T
'''
       col1      col2  col5
1  0.670750  2.670750   5.0
2  0.707864  2.707864   6.0
3  0.206923  2.206923   7.0
4  0.168023  2.168023   8.0
'''

this should do what you need这应该做你需要的

df = df.loc[:,~df.apply(lambda x: x.duplicated(),axis=1).all()].copy()

as you can see from this link你可以从这个链接看到

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas dataframe - 数据中有重复,但重复不在同一列中 - Pandas dataframe - duplicates in data but dups don't reside in same columns 如何根据 DataFrame Python Pandas 中其他 2 列中的值删除一列中的重复项? - How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas? 如何聚合 DataFrame 并根据 Python Pandas 中两列中的值删除重复项? - How to aggregate DataFrame and drop duplicates based on values in two columns in Python Pandas? 根据空值的百分比删除pandas数据帧中的列 - Drop columns in a pandas dataframe based on the % of null values 如何通过熊猫或火花数据框删除所有行中具有相同值的列? - How to drop columns which have same values in all rows via pandas or spark dataframe? 如何根据其他列中的值从Pandas数据框中查找重复项? - How to find duplicates from a Pandas dataframe based upon the values in other columns? 如何从 Pandas DataFrame 的列内的列表中删除值 - How to drop values from lists inside columns from a Pandas DataFrame 如何使用具有相同名称/标识符的多个列创建Pandas DataFrame - How to create Pandas DataFrame with multiple columns that have the same name/indentifier 根据从第三列开始的所有列,从 pandas dataframe 删除重复项 - Drop duplicates from a pandas dataframe based on all columns starting from the third one 熊猫在基于其他列的列中删除值 - pandas drop values in a columns based other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM