简体   繁体   English

使用 Pandas 循环根据两列中的条件组合创建新的数据框

[英]Using a Pandas loop to create a new dataframe based on combination of conditions in two columns

Quite new to Python/Pandas and trying to perform an operation on my dataframe in Python 3 in iPython Notebook.对 Python/Pandas 非常陌生,并尝试在 iPython Notebook 的 Python 3 中对我的数据帧执行操作。

I have a df:我有一个 df:

df=    Building ID   CorporationName  IndividualName
       1             Sample, LLC      John 
       1             n/a              Sam 
       1             n/a              Nancy 
       2             n/a              Tim
       2             n/a              Larry
       2             n/a              Paul 
       3             n/a              Rachel 
       4             Sample1, LLC     Dan 

And I'd like to create a new dataframe, taking only the rows that have 'n/a' as a value under CorporationName for all matching BuildingID values.而且我想创建一个新的数据框,仅将所有匹配的 BuildingID 值的 CorporationName 下的值为 'n/a' 的行作为值。 Normally, this would be easy, but in this case, we have duplicate BuildingID values, even though the entire row is not a duplicate.通常,这很容易,但在这种情况下,我们有重复的 BuildingID 值,即使整行不是重复的。 So ideally, our output would look like this:所以理想情况下,我们的输出应该是这样的:

  no corp =     Building ID   CorporationName  IndividualName
                2              n/a              Tim
                2              n/a              Larry
                2              n/a              Paul 
                3              n/a              Rachel 

My first inclination was to do something like:我的第一个倾向是做这样的事情:

nocorp = ownercombo[ownercombo.CorporationName == 'n/a']

But obviously this will return rows for 'n/a' is true for some entries related to a BuildingID, not all.但显然这将返回 'n/a' 的行对于与 BuildingID 相关的某些条目是真的,而不是全部。

To be honest, I really don't know how to do this.老实说,我真的不知道该怎么做。 I searched everywhere, and the closest I could find was this post, which suggests using groupby.我到处搜索,我能找到的最接近的是这篇文章,它建议使用 groupby。 But, I realized if I do it this way it will just return four booleans:但是,我意识到如果我这样做,它只会返回四个布尔值:

In:    morethanone = ownercombo.groupby((ownercombo['BuildingID'].value_counts() > 1))
Out:                CorporationName

        BuildingID  
             False    True
             True     True

I'm clearly not anywhere near the right track, so any help pointing me in the right direction would be extremely appreciated!我显然不在正确的轨道附近,因此将非常感谢为我指明正确方向的任何帮助!

You could use groupby/filter :您可以使用groupby/filter

In [118]: df.groupby('Building ID').filter(lambda x: (x['CorporationName']=='n/a').all())
Out[118]: 
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

You can first find unique values of column CorporationName , which not ( ~ ) contains string n/a .您可以首先找到CorporationName列的unique值,其中 not ( ~ ) contains字符串n/a Then you can filter DataFrame by mask with isin :然后,您可以过滤DataFrame用面膜isin

uni= ownercombo.loc[~ownercombo.CorporationName.str.contains('n/a'), 'Building ID'].unique()
print uni
[1 4]
print ownercombo[~ownercombo['Building ID'].isin(uni)]
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于组合1和许多列在Pandas DataFrame中创建新列 - How to create a new column in Pandas DataFrame based on a combination 1 and many columns 根据条件删除行并创建 pandas dataframe 的新列 - Drop rows based on conditions and create new columns of pandas dataframe 根据两列的组合过滤 Pandas 数据框 - Filter Pandas dataframe based on combination of two columns 根据 pandas dataframe 中的两列值创建新的 dataframe - Create a new dataframe based on two columns of value in pandas dataframe Using a for loop combined with a nested if statement to create a new pandas DataFrame based on 3 columns of a different DataFrame in Python - Using a for loop combined with a nested if statement to create a new pandas DataFrame based on 3 columns of a different DataFrame in Python 根据条件在 Pandas DataFrame 中创建新行 - Create new row in Pandas DataFrame based on conditions 具有基于两个条件的列的新DataFrame - New DataFrame with columns based on two conditions Python Pandas:基于两列在dataFrame中创建新行 - Python Pandas: Create new rows in dataFrame based on two columns Pandas DataFrame 基于其他两列创建新的 csv 列 - Pandas DataFrame create new csv column based on two other columns 根据 Pandas Dataframe 中其他列的条件获取两列的总和 - Get sum of two columns based on conditions of other columns in a Pandas Dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM