简体   繁体   中英

Using a Pandas loop to create a new dataframe based on combination of conditions in two columns

Quite new to Python/Pandas and trying to perform an operation on my dataframe in Python 3 in iPython Notebook.

I have a df:

df=    Building ID   CorporationName  IndividualName
       1             Sample, LLC      John 
       1             n/a              Sam 
       1             n/a              Nancy 
       2             n/a              Tim
       2             n/a              Larry
       2             n/a              Paul 
       3             n/a              Rachel 
       4             Sample1, LLC     Dan 

And I'd like to create a new dataframe, taking only the rows that have 'n/a' as a value under CorporationName for all matching BuildingID values. Normally, this would be easy, but in this case, we have duplicate BuildingID values, even though the entire row is not a duplicate. So ideally, our output would look like this:

  no corp =     Building ID   CorporationName  IndividualName
                2              n/a              Tim
                2              n/a              Larry
                2              n/a              Paul 
                3              n/a              Rachel 

My first inclination was to do something like:

nocorp = ownercombo[ownercombo.CorporationName == 'n/a']

But obviously this will return rows for 'n/a' is true for some entries related to a BuildingID, not all.

To be honest, I really don't know how to do this. I searched everywhere, and the closest I could find was this post, which suggests using groupby. But, I realized if I do it this way it will just return four booleans:

In:    morethanone = ownercombo.groupby((ownercombo['BuildingID'].value_counts() > 1))
Out:                CorporationName

        BuildingID  
             False    True
             True     True

I'm clearly not anywhere near the right track, so any help pointing me in the right direction would be extremely appreciated!

You could use groupby/filter :

In [118]: df.groupby('Building ID').filter(lambda x: (x['CorporationName']=='n/a').all())
Out[118]: 
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

You can first find unique values of column CorporationName , which not ( ~ ) contains string n/a . Then you can filter DataFrame by mask with isin :

uni= ownercombo.loc[~ownercombo.CorporationName.str.contains('n/a'), 'Building ID'].unique()
print uni
[1 4]
print ownercombo[~ownercombo['Building ID'].isin(uni)]
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM