如何遍历列并按组检查条件

Question

我有一段时间（2001-2003 年）内许多国家的数据。 它看起来像这样：

指数	年	国家	通货膨胀	国内生产总值
1	2001年	AFG	楠	48
2	2002年	AFG	楠	49
3	2003年	AFG	楠	50
4	2001年	气	3.0	楠
5	2002年	气	5.0	楠
6	2003年	气	7.0	楠
7	2001年	美国	楠	220
8	2002年	美国	4.0	250
9	2003年	美国	2.5	280

如果任何给定变量都没有数据（即所有年份的值都缺失），我想删除国家/地区。

在上面的示例表中，我想删除 AFG（因为它错过了所有通货膨胀值）和 CHI（GDP 缺失）。 我不想仅仅因为缺少一年就放弃观察#7。

最好的方法是什么？

Answer 1

这应该通过过滤在（通货膨胀，GDP）之一中具有 nan 的所有值来工作：

(
    df.groupby(['country'])
    .filter(lambda x: not x['inflation'].isnull().all() and not x['GDP'].isnull().all())
)

请注意，如果您有两个以上的列，则可以使用更通用的版本：

df.groupby(['country']).filter(lambda x: not x.isnull().all().any())

Answer 2

你也可以试试这个：

# check where the sum is equal to 0 - means no values in the column for a specific country
group_by = df.groupby(['country']).agg({'inflation':sum, 'GDP':sum}).reset_index()

# extract only countries with information on both columns
indexes = group_by[ (group_by['GDP'] != 0) & ( group_by['inflation'] != 0) ].index
final_countries = list(group_by.loc[ group_by.index.isin(indexes), : ]['country'])

# keep the rows contains the countries

df = df.drop(df[~df.country.isin(final_countries)].index)

Answer 3

您可以将数据框从长调整为宽，删除空值，然后再转换回宽。

要从长转换为宽，可以使用pivot 函数。 也看到这个问题。

这是重构后删除空值的代码：

df.dropna(axis=0, how= 'any', thresh=None, subset=None, inplace=True) # Delete rows, where any value is null

要转换回 long，您可以使用 pd.melt。

如何遍历列并按组检查条件

问题描述

3 个解决方案

解决方案1
0 已采纳 2022-01-14 19:30:02

解决方案2
0 2022-01-14 19:35:12

解决方案3
0 2022-01-14 19:36:15

如何遍历列并按组检查条件

问题描述

3 个解决方案

解决方案1 0 已采纳 2022-01-14 19:30:02

解决方案2 0 2022-01-14 19:35:12

解决方案3 0 2022-01-14 19:36:15

解决方案1
0 已采纳 2022-01-14 19:30:02

解决方案2
0 2022-01-14 19:35:12

解决方案3
0 2022-01-14 19:36:15