Pandas Dataframe：如何根据索引子集 drop_duplicates()？

Question

Wondering if someone could please help me on this:想知道是否有人可以帮助我：

Have a pandas df with a rather large amount of columns (over 50).有一个 pandas df 具有相当多的列（超过 50 个）。 I'd like to remove duplicates based on a subset (column 2 to 50).我想根据子集（第 2 到 50 列）删除重复项。

Been trying to use df.drop_duplicates(subset=["col1","col2",....]), but wondering if there is a way to pass the column index instead so I don't have to actually write out all the column headers to consider for the drop but instead can do something along the lines of df.drop_duplicates(subset = [2:])一直在尝试使用 df.drop_duplicates(subset=["col1","col2",....])，但想知道是否有办法传递列索引，所以我不必实际写出所有要考虑删除的列标题，但可以按照 df.drop_duplicates(subset = [2:]) 的方式做一些事情

Thanks upfront预先感谢

Answer 1

You can slice df.columns like:您可以像这样df.columns进行切片：

df.drop_duplicates(subset = df.columns[2:])

Or:或者：

df.drop_duplicates(subset = df.columns[2:].tolist())

Pandas Dataframe：如何根据索引子集 drop_duplicates()？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-23 14:49:30

Pandas Dataframe：如何根据索引子集 drop_duplicates()？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-23 14:49:30

解决方案1
1 已采纳 2021-02-23 14:49:30