pandas DataFrame 中列的值計數，其中字符串為 'nan'

Question

假設我有以下pd.DataFrame

>>> df = pd.DataFrame({
    'col_1': ['Elon', 'Jeff', 'Warren', 'Mark'],
    'col_2': ['nan', 'Bezos', 'Buffet', 'nan'],
    'col_3': ['nan', 'Amazon', 'Berkshire', 'Meta'],
})

這讓我

    col_1   col_2   col_3
0   Elon    nan     nan
1   Jeff    Bezos   Amazon
2   Warren  Buffet  Berkshire
3   Mark    nan     Meta

所有列類型都是字符串。 我想要一種方法來獲取單元格值為'nan'每列的行數。

在我簡單地運行以下命令的地方，我總是得到零作為丟失的計數，因為它不檢查包含 nan 的字符串。

>>> df.isna().sum()

col_1    0
col_2    0
col_3    0
dtype: int64

然而，我想要的是得到

col_1    0
col_2    2
col_3    1

我怎樣才能做到這一點？

Answer 1

你有nan作為字符串，你可以這樣做：

df.eq("nan").sum()

output：

col_1    0
col_2    2
col_3    1
dtype: int64

Answer 2

我花了一段時間才看到您更改了數據集的初始代碼。 但是，如果您想提取所有包含'nan'字符串的行，我會使用掩碼。

mask = np.column_stack([df[col].str.contains("nan", na = False) for col in df])
df_new = df.loc[mask.any(axis = 1)]

這將創建一個您可以試驗的新數據框。

pandas DataFrame 中列的值計數，其中字符串為 'nan'

問題描述

2 個解決方案

解決方案1
2 已采納 2022-11-29 14:50:45

解決方案2
0 2022-11-29 14:58:22

pandas DataFrame 中列的值計數，其中字符串為 'nan'

問題描述

2 個解決方案

解決方案1 2 已采納 2022-11-29 14:50:45

解決方案2 0 2022-11-29 14:58:22

解決方案1
2 已采納 2022-11-29 14:50:45

解決方案2
0 2022-11-29 14:58:22