使用 Pandas 循环根据两列中的条件组合创建新的数据框

Question

Quite new to Python/Pandas and trying to perform an operation on my dataframe in Python 3 in iPython Notebook.对 Python/Pandas 非常陌生，并尝试在 iPython Notebook 的 Python 3 中对我的数据帧执行操作。

I have a df:我有一个 df：

df=    Building ID   CorporationName  IndividualName
       1             Sample, LLC      John 
       1             n/a              Sam 
       1             n/a              Nancy 
       2             n/a              Tim
       2             n/a              Larry
       2             n/a              Paul 
       3             n/a              Rachel 
       4             Sample1, LLC     Dan

And I'd like to create a new dataframe, taking only the rows that have 'n/a' as a value under CorporationName for all matching BuildingID values.而且我想创建一个新的数据框，仅将所有匹配的 BuildingID 值的 CorporationName 下的值为 'n/a' 的行作为值。 Normally, this would be easy, but in this case, we have duplicate BuildingID values, even though the entire row is not a duplicate.通常，这很容易，但在这种情况下，我们有重复的 BuildingID 值，即使整行不是重复的。 So ideally, our output would look like this:所以理想情况下，我们的输出应该是这样的：

  no corp =     Building ID   CorporationName  IndividualName
                2              n/a              Tim
                2              n/a              Larry
                2              n/a              Paul 
                3              n/a              Rachel

My first inclination was to do something like:我的第一个倾向是做这样的事情：

nocorp = ownercombo[ownercombo.CorporationName == 'n/a']

But obviously this will return rows for 'n/a' is true for some entries related to a BuildingID, not all.但显然这将返回 'n/a' 的行对于与 BuildingID 相关的某些条目是真的，而不是全部。

To be honest, I really don't know how to do this.老实说，我真的不知道该怎么做。 I searched everywhere, and the closest I could find was this post, which suggests using groupby.我到处搜索，我能找到的最接近的是这篇文章，它建议使用 groupby。 But, I realized if I do it this way it will just return four booleans:但是，我意识到如果我这样做，它只会返回四个布尔值：

In:    morethanone = ownercombo.groupby((ownercombo['BuildingID'].value_counts() > 1))
Out:                CorporationName

        BuildingID  
             False    True
             True     True

I'm clearly not anywhere near the right track, so any help pointing me in the right direction would be extremely appreciated!我显然不在正确的轨道附近，因此将非常感谢为我指明正确方向的任何帮助！

Answer 1

You could use groupby/filter :您可以使用groupby/filter ：

In [118]: df.groupby('Building ID').filter(lambda x: (x['CorporationName']=='n/a').all())
Out[118]: 
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

Answer 2

You can first find unique values of column CorporationName , which not ( ~ ) contains string n/a .您可以首先找到CorporationName列的unique值，其中 not ( ~ ) contains字符串n/a 。 Then you can filter DataFrame by mask with isin :然后，您可以过滤DataFrame用面膜isin ：

uni= ownercombo.loc[~ownercombo.CorporationName.str.contains('n/a'), 'Building ID'].unique()
print uni
[1 4]
print ownercombo[~ownercombo['Building ID'].isin(uni)]
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

使用 Pandas 循环根据两列中的条件组合创建新的数据框

问题描述

2 个解决方案

解决方案1
1 2016-03-17 19:02:36

解决方案2
0 2016-03-17 18:59:59

使用 Pandas 循环根据两列中的条件组合创建新的数据框

问题描述

2 个解决方案

解决方案1 1 2016-03-17 19:02:36

解决方案2 0 2016-03-17 18:59:59

解决方案1
1 2016-03-17 19:02:36

解决方案2
0 2016-03-17 18:59:59