简体   繁体   English

pandas - 过滤至少有一列在 groupby 中包含非空值的组

[英]pandas - filter on groups which have at least one column containing non-null values in a groupby

I have the following python pandas dataframe:我有以下 python pandas dataframe:

df = pd.DataFrame({'Id': ['1', '1', '1', '2', '2', '3'], 'A': ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'FALSE'], 'B': [np.nan, np.nan, 'abc', np.nan, np.nan, 'def'],'C': [np.nan, np.nan, np.nan, np.nan, np.nan, '456']})

>>> print(df)
  Id      A    B    C
0  1   TRUE  NaN  NaN
1  1   TRUE  NaN  NaN
2  1   TRUE  abc  NaN
3  2   TRUE  NaN  NaN
4  2   TRUE  NaN  NaN
5  3  FALSE  def  456

I want to end up with the following dataframe:我想以以下 dataframe 结尾:

>>> print(dfout)
  Id     A    B   C
0  1  TRUE  abc NaN

The same Id value can appear on multiple rows.相同的 Id 值可以出现在多行中。 Each Id will either have the value TRUE or FALSE in column A consistently on all its rows.每个 Id 在其所有行的 A 列中的值将一致为 TRUE 或 FALSE。 Columns B and C can have any value, including NaN. B 列和 C 可以是任何值,包括 NaN。
I want one row in dfout for each Id that has A=TRUE and show the max value seen in columns B and C. But if the only values seen in columns B and C = NaN for all of an Id's rows, then that Id is to be excluded from dfout.我想在 dfout 中为每个具有 A=TRUE 的 ID 显示一行,并显示在 B 列和 C 中看到的最大值。但是如果在 B 列和 C 中看到的唯一值 = NaN 对于所有 Id 的行,那么该 Id 是从 dfout 中排除。

  • Id 1 has A=TRUE , and has B=abc in its third row, so it meets the requirements. Id 1 有A=TRUE ,并且在第三行有B=abc ,所以它符合要求。
  • Id 2 has A=TRUE , but columns B and C are NaN for both its rows, so it does not. Id 2 有A=TRUE ,但是列 B 和 C 的两行都是NaN ,所以它不是。
  • Id 3 has A=FALSE , so it does not meet requirements. Id 3 有A=FALSE ,所以不符合要求。

I created a groupby df on Id, then applied a mask to only include rows with A=TRUE.我在 Id 上创建了一个groupby df,然后应用了一个掩码以仅包含 A=TRUE 的行。 But having trouble understanding how to remove the rows with NaN for all rows in columns B and C.但是无法理解如何为 B 列和 C 中的所有行删除带有NaN的行。

grouped = df.groupby(['Id'])
mask = grouped['A'].transform(lambda x: 'TRUE' == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)

  Id     A    B    C
0  1  TRUE  NaN  NaN
1  1  TRUE  NaN  NaN
2  1  TRUE  abc  NaN
3  2  TRUE  NaN  NaN
4  2  TRUE  NaN  NaN

Then I tried several things along the lines of:然后我尝试了几件事:

df.loc[mask].reset_index(drop=True).all(['B'],['C']).isnull

But getting errors, like:但是出现错误,例如:

" TypeError: unhashable type: 'list' ". “类型错误:无法散列的类型:‘列表’”。

Using python 3.6, pandas 0.23.0;使用 python 3.6、pandas 0.23.0; Looked here for help: keep dataframe rows meeting a condition into each group of the same dataframe grouped by在这里寻求帮助: 将满足条件的 dataframe 行保留为相同 dataframe 分组的每一组

The solution has three parts to it.该解决方案包含三个部分。

  1. Filter dataframe to keep rows where column A is True筛选 dataframe 以保留 A 列为 True 的行

  2. Groupby Id and use first which will return first not null value Groupby Id 并首先使用,这将首先返回而不是 null 值

  3. Use dropna on the resulting dataframe on columns B and C with how = 'all'在 B 列的结果 dataframe 和 C 上使用 dropna,how = 'all'

    df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all') df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')

     Id AB C 0 1 True abc NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas groupby将非空值计为百分比 - Pandas groupby count non-null values as percentage 如何将每列只有 1 个非空条目的 Pandas 数据框中的多行合并为一行? - How to combine multiple rows in a pandas dataframe which have only 1 non-null entry per column into one row? 从熊猫数据框中的多个列创建一个包含所有非空值的单个列 - create a single column containing all non-null values from multiple columns in a pandas dataframe python数据框,基于一列的groupby并使用最后一个非空值填充另一列的空值 - python dataframe, groupby based on one column and fill null values from another column using last non-null value 压缩 pandas DataFrame 具有非空值并修改列名 - Squeezing pandas DataFrame to have non-null values and modify column names 计算至少有一个非 null 响应的列值的数量(列的唯一值的数量) - Count the number of column values (number of unique values of column) that have at least one non null response Pandas Groupby Select 在一列中具有多个唯一值的组 - Pandas Groupby Select Groups that Have More Than One Unique Values in a Column 仅从pandas df保存非空条目值和列号,每行仅一个非空值 - Saving only non-null entry value and column number from pandas df with only one non-null value per row 使用 Groupby 在 Pandas DataFrame 中重复非空值 - repeat non-null value in Pandas DataFrame with Groupby Pandas 系列 - groupby 并取最近的非空累积 - Pandas Series - groupby and take cumulative most recent non-null
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM