Using a Pandas loop to create a new dataframe based on combination of conditions in two columns

Question

Quite new to Python/Pandas and trying to perform an operation on my dataframe in Python 3 in iPython Notebook.

I have a df:

df=    Building ID   CorporationName  IndividualName
       1             Sample, LLC      John 
       1             n/a              Sam 
       1             n/a              Nancy 
       2             n/a              Tim
       2             n/a              Larry
       2             n/a              Paul 
       3             n/a              Rachel 
       4             Sample1, LLC     Dan

And I'd like to create a new dataframe, taking only the rows that have 'n/a' as a value under CorporationName for all matching BuildingID values. Normally, this would be easy, but in this case, we have duplicate BuildingID values, even though the entire row is not a duplicate. So ideally, our output would look like this:

  no corp =     Building ID   CorporationName  IndividualName
                2              n/a              Tim
                2              n/a              Larry
                2              n/a              Paul 
                3              n/a              Rachel

My first inclination was to do something like:

nocorp = ownercombo[ownercombo.CorporationName == 'n/a']

But obviously this will return rows for 'n/a' is true for some entries related to a BuildingID, not all.

To be honest, I really don't know how to do this. I searched everywhere, and the closest I could find was this post, which suggests using groupby. But, I realized if I do it this way it will just return four booleans:

In:    morethanone = ownercombo.groupby((ownercombo['BuildingID'].value_counts() > 1))
Out:                CorporationName

        BuildingID  
             False    True
             True     True

I'm clearly not anywhere near the right track, so any help pointing me in the right direction would be extremely appreciated!

Answer 1

You could use groupby/filter :

In [118]: df.groupby('Building ID').filter(lambda x: (x['CorporationName']=='n/a').all())
Out[118]: 
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

Answer 2

You can first find unique values of column CorporationName , which not ( ~ ) contains string n/a . Then you can filter DataFrame by mask with isin :

uni= ownercombo.loc[~ownercombo.CorporationName.str.contains('n/a'), 'Building ID'].unique()
print uni
[1 4]
print ownercombo[~ownercombo['Building ID'].isin(uni)]
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

Using a Pandas loop to create a new dataframe based on combination of conditions in two columns

Question

2 answers

solution1
1 2016-03-17 19:02:36

solution2
0 2016-03-17 18:59:59

Using a Pandas loop to create a new dataframe based on combination of conditions in two columns

Question

2 answers

solution1 1 2016-03-17 19:02:36

solution2 0 2016-03-17 18:59:59

solution1
1 2016-03-17 19:02:36

solution2
0 2016-03-17 18:59:59