[英]select rows based on certain conditions with pandas
I'd like to return the rows which has all columns > 0 or where only 2012 can be < 0. 我想返回所有列均大于0或仅2012年可以小于0的行。
import pandas as pd
import numpy as np
df = pd.DataFrame( {
'A': ['d','d','d','f','f','f','g','g','g','h','h','h'],
'B': [5,5,6,7,5,6,6,7,7,6,7,7],
'C': [1,1,1,1,1,1,1,1,1,1,1,1],
'S': [2012,2013,2014,2015,2016,2012,2013,2014,2015,2016,2012,2013]
} );
df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)
@jezrael, not exactly. @jezrael,不完全是。 I changed the dataframe to explain better. 我更改了数据框以进行更好的解释。 In the final result I need the rows where all columns are > 0 AND the ones where the columns are > 0, except for 2012. That one can be < 0. The result must show a new df with the columns that qualify. 在最终结果中,我需要所有列均> 0的行以及列> 0的行(2012年除外)。该行可以<0。结果必须显示带有合格列的新df。 So, in the example below, g yes, d no. 因此,在下面的示例中,g是,d否。
df = pd.DataFrame( {
'A': ['d','d','d','d','d','d','g','g','g','g','g','g'],
'B': [5,5,6,-7,5,6,-6,7,7,6,-7,7],
'C': [1,1,1,1,1,1,1,1,1,1,1,1],
'S': [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,2013]
} );
df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
S 2012 2013 2014 2015 2016
A
d 13 6 7 -6 6
g -11 8 8 8 7
EDITED Dataframe; 编辑的数据框;
df = pd.DataFrame( {
'A': ['d','d','d','d','d','d','g','g','g','g','g','g',
'k','k','k','k','k','k'],
'B': [5,5,6,7,5,6,-6,7,7,6,-7,7,-8,7,-6,6,-7,50],
'C': [1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2],
'S': [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,
2013,2012,2013,2014,2015,2016,2014]
} );
df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)
S 2012 2013 2014 2015 2016
A
d 13 6 7 8 6
g -11 8 8 8 7
k -6 9 48 8 -5
I think you can use double mask one for compare rows and one for columns: 我认为您可以使用双重遮罩一个用于比较行,一个用于列:
df = pd.DataFrame( {
'A': ['d','d','d','f','f','f','g','g','g','g','h','h','h', 'f'],
'B': [5,5,6,7,5,6,-6,7,7,7,6,7,7,2],
'C': [1,1,1,1,1,1,1,1,1,1,1,1,1,1],
'S': [2012,2013,2014,2015,2016,2012,2012,2013,2014,2015,2016,2012,2013,2013]
} );
df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)
S 2012 2013 2014 2015 2016
A
d 6 6 7 0 0
f 7 3 0 8 6
g -5 8 8 8 0
h 8 8 0 0 7
mask1 = df[2012] < 0
print (mask1)
A
d False
f False
g True
h False
Name: 2012, dtype: bool
mask2 = (df > 0).all()
print (mask2)
S
2012 False
2013 True
2014 False
2015 False
2016 False
dtype: bool
print (df.loc[mask1, mask2])
S 2013
A
g 8
print (df[mask1])
S 2012 2013 2014 2015 2016
A
g -5 8 8 8 0
print (df.loc[:,mask2])
S 2013
A
d 6
f 3
g 8
h 8
EDIT by edit of question: 通过问题编辑进行编辑:
mask1 = df[2012] < 0
print (mask1)
A
d False
g True
Name: 2012, dtype: bool
mask2 = (df.drop(2012, axis=1) > 0).all(axis=1)
print (mask2)
A
d False
g True
dtype: bool
print (df[mask1 & mask2])
S 2012 2013 2014 2015 2016
A
g -11 8 8 8 7
Combine the operators and use parentheses: 合并运算符并使用括号:
df[((df > 0).all(axis=1)) | (df[2012] < 0)]
Out[22]:
Empty DataFrame
Columns: [2012, 2013, 2014, 2015, 2016]
Index: []
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.