[英]Using a Pandas loop to create a new dataframe based on combination of conditions in two columns
Quite new to Python/Pandas and trying to perform an operation on my dataframe in Python 3 in iPython Notebook.对 Python/Pandas 非常陌生,并尝试在 iPython Notebook 的 Python 3 中对我的数据帧执行操作。
I have a df:我有一个 df:
df= Building ID CorporationName IndividualName
1 Sample, LLC John
1 n/a Sam
1 n/a Nancy
2 n/a Tim
2 n/a Larry
2 n/a Paul
3 n/a Rachel
4 Sample1, LLC Dan
And I'd like to create a new dataframe, taking only the rows that have 'n/a' as a value under CorporationName for all matching BuildingID values.而且我想创建一个新的数据框,仅将所有匹配的 BuildingID 值的 CorporationName 下的值为 'n/a' 的行作为值。 Normally, this would be easy, but in this case, we have duplicate BuildingID values, even though the entire row is not a duplicate.
通常,这很容易,但在这种情况下,我们有重复的 BuildingID 值,即使整行不是重复的。 So ideally, our output would look like this:
所以理想情况下,我们的输出应该是这样的:
no corp = Building ID CorporationName IndividualName
2 n/a Tim
2 n/a Larry
2 n/a Paul
3 n/a Rachel
My first inclination was to do something like:我的第一个倾向是做这样的事情:
nocorp = ownercombo[ownercombo.CorporationName == 'n/a']
But obviously this will return rows for 'n/a' is true for some entries related to a BuildingID, not all.但显然这将返回 'n/a' 的行对于与 BuildingID 相关的某些条目是真的,而不是全部。
To be honest, I really don't know how to do this.老实说,我真的不知道该怎么做。 I searched everywhere, and the closest I could find was this post, which suggests using groupby.
我到处搜索,我能找到的最接近的是这篇文章,它建议使用 groupby。 But, I realized if I do it this way it will just return four booleans:
但是,我意识到如果我这样做,它只会返回四个布尔值:
In: morethanone = ownercombo.groupby((ownercombo['BuildingID'].value_counts() > 1))
Out: CorporationName
BuildingID
False True
True True
I'm clearly not anywhere near the right track, so any help pointing me in the right direction would be extremely appreciated!我显然不在正确的轨道附近,因此将非常感谢为我指明正确方向的任何帮助!
You could use groupby/filter
:您可以使用
groupby/filter
:
In [118]: df.groupby('Building ID').filter(lambda x: (x['CorporationName']=='n/a').all())
Out[118]:
Building ID CorporationName IndividualName
3 2 n/a Tim
4 2 n/a Larry
5 2 n/a Paul
6 3 n/a Rachel
You can first find unique
values of column CorporationName
, which not ( ~
) contains
string n/a
.您可以首先找到
CorporationName
列的unique
值,其中 not ( ~
) contains
字符串n/a
。 Then you can filter DataFrame
by mask with isin
:然后,您可以过滤
DataFrame
用面膜isin
:
uni= ownercombo.loc[~ownercombo.CorporationName.str.contains('n/a'), 'Building ID'].unique()
print uni
[1 4]
print ownercombo[~ownercombo['Building ID'].isin(uni)]
Building ID CorporationName IndividualName
3 2 n/a Tim
4 2 n/a Larry
5 2 n/a Paul
6 3 n/a Rachel
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.