简体   繁体   English

Pandas Dataframe:如何根据索引子集 drop_duplicates()?

[英]Pandas Dataframe: How to drop_duplicates() based on index subset?

Wondering if someone could please help me on this:想知道是否有人可以帮助我:

Have a pandas df with a rather large amount of columns (over 50).有一个 pandas df 具有相当多的列(超过 50 个)。 I'd like to remove duplicates based on a subset (column 2 to 50).我想根据子集(第 2 到 50 列)删除重复项。

Been trying to use df.drop_duplicates(subset=["col1","col2",....]), but wondering if there is a way to pass the column index instead so I don't have to actually write out all the column headers to consider for the drop but instead can do something along the lines of df.drop_duplicates(subset = [2:])一直在尝试使用 df.drop_duplicates(subset=["col1","col2",....]),但想知道是否有办法传递列索引,所以我不必实际写出所有要考虑删除的列标题,但可以按照 df.drop_duplicates(subset = [2:]) 的方式做一些事情

Thanks upfront预先感谢

You can slice df.columns like:您可以像这样df.columns进行切片:

df.drop_duplicates(subset = df.columns[2:])

Or:或者:

df.drop_duplicates(subset = df.columns[2:].tolist())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM