[英]Pandas Dataframe: How to drop_duplicates() based on index subset?
Wondering if someone could please help me on this:想知道是否有人可以帮助我:
Have a pandas df with a rather large amount of columns (over 50).有一个 pandas df 具有相当多的列(超过 50 个)。 I'd like to remove duplicates based on a subset (column 2 to 50).
我想根据子集(第 2 到 50 列)删除重复项。
Been trying to use df.drop_duplicates(subset=["col1","col2",....]), but wondering if there is a way to pass the column index instead so I don't have to actually write out all the column headers to consider for the drop but instead can do something along the lines of df.drop_duplicates(subset = [2:])一直在尝试使用 df.drop_duplicates(subset=["col1","col2",....]),但想知道是否有办法传递列索引,所以我不必实际写出所有要考虑删除的列标题,但可以按照 df.drop_duplicates(subset = [2:]) 的方式做一些事情
Thanks upfront预先感谢
You can slice df.columns
like:您可以像这样
df.columns
进行切片:
df.drop_duplicates(subset = df.columns[2:])
Or:或者:
df.drop_duplicates(subset = df.columns[2:].tolist())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.