简体   繁体   English

如何删除具有重复列值的行并且列数并不总是固定的?

[英]How to drop rows with duplicates column values and the number of columns are not always fixed?

I have a dataframe and columns in that dataframe could be any number(2-50).我有一个数据框,该数据框中的列可以是任何数字(2-50)。 for example it is 2 columns as below.例如,它是 2 列,如下所示。 I want to remove rows where site1 and site2 are same.我想删除 site1 和 site2 相同的行。

df = pd.DataFrame([[507814, 501972], [529389, 529389], [508110, 508161]], columns = ['site1', 'site2'])

整个数据框

I want to drop rows with similar column values as below Expected Output:我想删除具有类似列值的行,如下预期输出:

在此处输入图片说明

df[df["site1"] != df["site2"]]

This can be done this this line, but as I do not have fix number of column and this piece is inside of loop i need a fastest way to do this这可以在这一行完成,但是由于我没有固定的列数并且这部分在循环内,我需要一种最快的方法来做到这一点

I appreciate the help in advance.我提前感谢您的帮助。

Thanks.谢谢。

If you have more columns, you can use set() + len() :如果您有更多列,则可以使用set() + len()

x = df[~df.apply(lambda x: len(set(x)), axis=1).eq(1)]
print(x)

Prints:印刷:

    site1   site2
0  507814  501972
2  508110  508161

Edit: To specify columns:编辑:要指定列:

x = df[~df[["site1", "site2"]].apply(lambda x: len(set(x)), axis=1).eq(1)]
print(x)

Prints:印刷:

    site1   site2   site3
0  507814  501972  508284
2  508110  508161  508098

df used: df使用:

    site1   site2   site3
0  507814  501972  508284
1  529389  529389  508284
2  508110  508161  508098

你可以这样做:

df = df[df.nunique(axis=1) > 1]

Here is another way.这是另一种方式。 This should work if all your site values are numbers.如果您的所有站点值都是数字,这应该有效。

df.loc[df.diff(axis=1).sum(axis=1).ne(0)]

Using your example, this filters the columns where site1 == site2 :使用您的示例,这将过滤site1 == site2的列:

# first option
df[~df.apply(lambda x: x["site1"] == x["site2"], axis=1)]

# second option
df.query("site1 != site2")

All options give you:所有选项都为您提供:

    site1   site2
0   507814  501972
2   508110  508161

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Pandas 中删除固定数量的标记行? - How to drop a fixed number of labelled rows in Pandas? 如何按日期对值进行排序并按列删除重复项? - How to sort values by date and drop duplicates by a column? 如何根据 DataFrame Python Pandas 中其他 2 列中的值删除一列中的重复项? - How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas? 如何删除与 pandas 中另一列中的值相关的列中的重复项? - How to drop duplicates in column with respect to values in another column in pandas? Pandas:如何删除重复但保留某些行值的列值 - Pandas: How to drop column values that are duplicates but keep certain row values 如何删除具有缺失值的列和行? - How to drop columns and rows with missing values? 如何丢弃重复项但如果某个特定的其他列不为空则保留行(Pandas) - How to drop duplicates but keep the rows if a particular other column is not null (Pandas) 如果两列中的连续值相同,如何在python中删除重复项? - How to drop duplicates in python if consecutive values are the same in two columns? 查找重复行,将某个列乘以重复项数,删除重复行 - Find duplicated rows, multiply a certain column by number of duplicates, drop duplicated rows 如何在Python中针对列值删除行? - How to drop rows with respect to a column values in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM