简体   繁体   English

根据特定条件使用熊猫选择行

[英]select rows based on certain conditions with pandas

I'd like to return the rows which has all columns > 0 or where only 2012 can be < 0. 我想返回所有列均大于0或仅2012年可以小于0的行。

import pandas as pd
import numpy as np

df = pd.DataFrame( {
   'A': ['d','d','d','f','f','f','g','g','g','h','h','h'],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1],
   'S': [2012,2013,2014,2015,2016,2012,2013,2014,2015,2016,2012,2013]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)

@jezrael, not exactly. @jezrael,不完全是。 I changed the dataframe to explain better. 我更改了数据框以进行更好的解释。 In the final result I need the rows where all columns are > 0 AND the ones where the columns are > 0, except for 2012. That one can be < 0. The result must show a new df with the columns that qualify. 在最终结果中,我需要所有列均> 0的行以及列> 0的行(2012年除外)。该行可以<0。结果必须显示带有合格列的新df。 So, in the example below, g yes, d no. 因此,在下面的示例中,g是,d否。

df = pd.DataFrame( {
   'A': ['d','d','d','d','d','d','g','g','g','g','g','g'],
   'B': [5,5,6,-7,5,6,-6,7,7,6,-7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1],
   'S': [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,2013]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)

S  2012  2013  2014  2015  2016
A                              
d    13     6     7    -6     6
g   -11     8     8     8     7

EDITED Dataframe; 编辑的数据框;

df = pd.DataFrame( {
   'A':  ['d','d','d','d','d','d','g','g','g','g','g','g',
    'k','k','k','k','k','k'],
   'B': [5,5,6,7,5,6,-6,7,7,6,-7,7,-8,7,-6,6,-7,50],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2],
   'S':   [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,
        2013,2012,2013,2014,2015,2016,2014]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)

S  2012  2013  2014  2015  2016
A                              
d    13     6     7     8     6
g   -11     8     8     8     7
k    -6     9     48     8    -5

I think you can use double mask one for compare rows and one for columns: 我认为您可以使用双重遮罩一个用于比较行,一个用于列:

df = pd.DataFrame( {
   'A': ['d','d','d','f','f','f','g','g','g','g','h','h','h', 'f'],
   'B': [5,5,6,7,5,6,-6,7,7,7,6,7,7,2],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1,1,1],
   'S': [2012,2013,2014,2015,2016,2012,2012,2013,2014,2015,2016,2012,2013,2013]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)
S  2012  2013  2014  2015  2016
A                              
d     6     6     7     0     0
f     7     3     0     8     6
g    -5     8     8     8     0
h     8     8     0     0     7
mask1 = df[2012] < 0
print (mask1)
A
d    False
f    False
g     True
h    False
Name: 2012, dtype: bool

mask2 = (df > 0).all()
print (mask2)
S
2012    False
2013     True
2014    False
2015    False
2016    False
dtype: bool

print (df.loc[mask1, mask2])
S  2013
A      
g     8

print (df[mask1])
S  2012  2013  2014  2015  2016
A                              
g    -5     8     8     8     0

print (df.loc[:,mask2])
S  2013
A      
d     6
f     3
g     8
h     8

EDIT by edit of question: 通过问题编辑进行编辑:

mask1 = df[2012] < 0
print (mask1)
A
d    False
g     True
Name: 2012, dtype: bool

mask2 = (df.drop(2012, axis=1) > 0).all(axis=1)
print (mask2)
A
d    False
g     True
dtype: bool

print (df[mask1 & mask2])
S  2012  2013  2014  2015  2016
A                              
g   -11     8     8     8     7

Combine the operators and use parentheses: 合并运算符并使用括号:

df[((df > 0).all(axis=1)) | (df[2012] < 0)]
Out[22]: 
Empty DataFrame
Columns: [2012, 2013, 2014, 2015, 2016]
Index: []

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM