简体   繁体   English

删除值显示在熊猫所有列中的行

[英]Remove rows where values appear in all columns in Pandas

Here is a very simple dataframe: 这是一个非常简单的数据框:

df = pd.DataFrame({'col1' :[1,2,3], 
                   'col2' :[1,3,3] })

I'm trying to remove rows where there are duplicate values (eg, row 3) 我正在尝试删除存在重复值的行(例如,第3行)

This doesn't work, 这不行

df = df[(df.col1 != 3 & df.col2 != 3)]

and the documentation is pretty clear about why, which makes sense. 并且文档非常清楚地说明了原因,这是有道理的。

But I still don't know how to delete that row. 但是我仍然不知道如何删除该行。

Does anyone have any ideas? 有人有什么想法吗? Thanks. 谢谢。 Monica. 莫妮卡。

If I understand your question correctly, I think you were close. 如果我正确理解了您的问题,我认为您很亲密。

Starting from your data: 从您的数据开始:

In [20]: df
Out[20]: 
   col1  col2
0     1     1
1     2     3
2     3     3

And doing this: 并这样做:

In [21]: df = df[df['col1'] != df['col2']]

Returns: 返回:

In [22]: df
Out[22]: 
   col1  col2
1     2     3

What about: 关于什么:

In [43]: df = pd.DataFrame({'col1' :[1,2,3], 
                   'col2' :[1,3,3] })

In [44]: df[df.max(axis=1) != df.min(axis=1)]
Out[44]: 
   col1  col2
1     2     3

[1 rows x 2 columns]

We want to remove rows whose values show up in all columns, or in other words the values are equal => their minimums and maximums are equal. 我们要删除其值显示在所有列中的行,换句话说,值等于=>它们的最小值和最大值相等。 This is method works on a DataFrame with any number of columns. 该方法适用于具有任意数量列的DataFrame If we apply the above, we remove rows 0 and 2. 如果应用上述内容,我们将删除第0行和第2行。

Any row with all the same values with have zero as the standard deviation. 具有相同值的任何行的标准偏差都为零。 One way to filter them is as follows: 过滤它们的一种方法如下:

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1' :[1, 2, 3, np.nan], 
                   'col2' :[1, 3, 3, np.nan]}

>>> df.loc[df.std(axis=1, skipna=False) > 0]
   col1  col2
    1     2     

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM