Quite new to Python/Pandas and trying to perform an operation on my dataframe in Python 3 in iPython Notebook.
I have a df:
df= Building ID CorporationName IndividualName
1 Sample, LLC John
1 n/a Sam
1 n/a Nancy
2 n/a Tim
2 n/a Larry
2 n/a Paul
3 n/a Rachel
4 Sample1, LLC Dan
And I'd like to create a new dataframe, taking only the rows that have 'n/a' as a value under CorporationName for all matching BuildingID values. Normally, this would be easy, but in this case, we have duplicate BuildingID values, even though the entire row is not a duplicate. So ideally, our output would look like this:
no corp = Building ID CorporationName IndividualName
2 n/a Tim
2 n/a Larry
2 n/a Paul
3 n/a Rachel
My first inclination was to do something like:
nocorp = ownercombo[ownercombo.CorporationName == 'n/a']
But obviously this will return rows for 'n/a' is true for some entries related to a BuildingID, not all.
To be honest, I really don't know how to do this. I searched everywhere, and the closest I could find was this post, which suggests using groupby. But, I realized if I do it this way it will just return four booleans:
In: morethanone = ownercombo.groupby((ownercombo['BuildingID'].value_counts() > 1))
Out: CorporationName
BuildingID
False True
True True
I'm clearly not anywhere near the right track, so any help pointing me in the right direction would be extremely appreciated!
You could use groupby/filter
:
In [118]: df.groupby('Building ID').filter(lambda x: (x['CorporationName']=='n/a').all())
Out[118]:
Building ID CorporationName IndividualName
3 2 n/a Tim
4 2 n/a Larry
5 2 n/a Paul
6 3 n/a Rachel
You can first find unique
values of column CorporationName
, which not ( ~
) contains
string n/a
. Then you can filter DataFrame
by mask with isin
:
uni= ownercombo.loc[~ownercombo.CorporationName.str.contains('n/a'), 'Building ID'].unique()
print uni
[1 4]
print ownercombo[~ownercombo['Building ID'].isin(uni)]
Building ID CorporationName IndividualName
3 2 n/a Tim
4 2 n/a Larry
5 2 n/a Paul
6 3 n/a Rachel
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.