[英]Remove rows where values appear in all columns in Pandas
Here is a very simple dataframe: 这是一个非常简单的数据框:
df = pd.DataFrame({'col1' :[1,2,3],
'col2' :[1,3,3] })
I'm trying to remove rows where there are duplicate values (eg, row 3) 我正在尝试删除存在重复值的行(例如,第3行)
This doesn't work, 这不行
df = df[(df.col1 != 3 & df.col2 != 3)]
and the documentation is pretty clear about why, which makes sense. 并且文档非常清楚地说明了原因,这是有道理的。
But I still don't know how to delete that row. 但是我仍然不知道如何删除该行。
Does anyone have any ideas? 有人有什么想法吗? Thanks.
谢谢。 Monica.
莫妮卡。
If I understand your question correctly, I think you were close. 如果我正确理解了您的问题,我认为您很亲密。
Starting from your data: 从您的数据开始:
In [20]: df
Out[20]:
col1 col2
0 1 1
1 2 3
2 3 3
And doing this: 并这样做:
In [21]: df = df[df['col1'] != df['col2']]
Returns: 返回:
In [22]: df
Out[22]:
col1 col2
1 2 3
What about: 关于什么:
In [43]: df = pd.DataFrame({'col1' :[1,2,3],
'col2' :[1,3,3] })
In [44]: df[df.max(axis=1) != df.min(axis=1)]
Out[44]:
col1 col2
1 2 3
[1 rows x 2 columns]
We want to remove rows whose values show up in all columns, or in other words the values are equal => their minimums and maximums are equal. 我们要删除其值显示在所有列中的行,换句话说,值等于=>它们的最小值和最大值相等。 This is method works on a
DataFrame
with any number of columns. 该方法适用于具有任意数量列的
DataFrame
。 If we apply the above, we remove rows 0 and 2. 如果应用上述内容,我们将删除第0行和第2行。
Any row with all the same values with have zero as the standard deviation. 具有相同值的任何行的标准偏差都为零。 One way to filter them is as follows:
过滤它们的一种方法如下:
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1' :[1, 2, 3, np.nan],
'col2' :[1, 3, 3, np.nan]}
>>> df.loc[df.std(axis=1, skipna=False) > 0]
col1 col2
1 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.