简体   繁体   English

Python:如何查找一列中的哪些值在另一特定列(数据框)中具有NaN值

[英]Python: How to find which values in a column have NaN values in another specific column (dataframes)

Suppose we have df1 that looks like this: 假设我们的df1如下所示:

x1 = [{'partner': "Afghanistan", 'commodity': NaN}, 
      {'partner': "Zambia",      'commodity': 2}, 
      {'partner': "Germany",     'commodity': 2},
      {'partner': "Afghanistan", 'commodity': NaN},
      {'partner': "Canada",      'commodity': NaN},
      {'partner': "Italy",       'commodity': 3},
      {'partner': "Canada",      'commodity': NaN},
      {'partner': "USA",         'commodity': NaN}]

df1 = pd.DataFrame(x1)

What I want to do is see the list of values in partner that have the NaN value in commodity , but I don't want to have the same partner listed twice. 我想做的是查看commodity中具有NaN值的partner中的值列表,但是我不想让同一个partner列出两次。

So my preferred result would look like this: 因此,我的首选结果将如下所示:

commodity_nan_partners=
Afghanistan
Canada
USA

and not: 并不是:

Afghanistan
Afghanistan
Canada
Canada
USA

You can look for NaN values using isnull , then get unique values with unique or set : 您可以使用isnull查找NaN值,然后使用uniqueset获得唯一值:

>>> pd.Series(df1.loc[df1.commodity.isnull(),'partner'].unique())
0    Afghanistan
1         Canada
2            USA
dtype: object

# or
>>> pd.Series(list(set(df1.loc[df1.commodity.isnull(),'partner'])))
0         Canada
1    Afghanistan
2            USA
dtype: object

loc + isnull + drop_duplicates loc + isnull + drop_duplicates

You can filter your series and then drop duplicates: 您可以过滤您的系列,然后删除重复项:

res = df1.loc[df1['commodity'].isnull(), 'partner'].drop_duplicates()

print(res)

0    Afghanistan
4         Canada
7            USA
Name: partner, dtype: object

Step 1 第1步
Filter out to retain valid strings only: 筛选出仅保留有效字符串:

v = df1.loc[df1.commodity.isna(), 'partner']

Or, 要么,

v = df1.partner[df1.commodity.isna()]

print(v)
0    Afghanistan
3    Afghanistan
4         Canada
6         Canada
7            USA
Name: partner, dtype: object

Step 2 第2步
Drop duplicates. 删除重复项。

If you want a collection, 如果您要收藏,

ingredients.unique()
array(['Afghanistan', 'Canada', 'USA'], dtype=object)

Or, 要么,

set(ingredients)
{'Afghanistan', 'Canada', 'USA'}

If you want a Series, 如果您想要系列,

ser = ingredients.drop_duplicates().reset_index(drop=True)

0    Afghanistan
1         Canada
2            USA
Name: partner, dtype: object

If you want a DataFrame, 如果您想要一个DataFrame,

df = ser.to_frame()

May check with dropna , just provide a different Idea here . 可以与dropna ,在这里提供一个不同的想法。

set(df1.partner.tolist())-set(df1.dropna().partner.tolist())
Out[94]: {'Afghanistan', 'Canada', 'USA'}

Just another alternatives: 只是另一种选择:

>>> df1[df1.isnull().any(axis=1)]['partner'].drop_duplicates()
0    Afghanistan
4         Canada
7            USA
Name: partner, dtype: object

Using loc + np.isnan 使用loc + np.isnan

>>> df1.loc[np.isnan(df1.commodity), 'partner'].drop_duplicates()
0    Afghanistan
4         Canada
7            USA
Name: partner, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何删除另一列中具有特定值的所有 NaN 值的行? - How to remove rows that have all NaN values for a specific value in another column? 如果另一列是NaN,如何替换列中的值? - How to replace values in a column if another column is a NaN? 如何找到一种方法来查找在第 3 列中没有任何特定条目的第 1 列的所有值 - how to find a way to find all values of column 1 which don't have any specific entries in column 3 Python:在匹配不同列中的值后,用来自另一个数据帧的值替换特定列中的 NaN - Python: Replacing NaN in a specific column by values from another dataframe after matching values in a different column 如何检查nan以获取python中列出的熊猫列值 - how to check nan for panda column values which are lists in python 用另一Python的值填充一列的nan - Filling nan of one column with the values of another Python 用 python 中另一列的值更新一列,但只有 NaN 值 - Updating a column with another column's values in python but only NaN values 如何将数据帧与 python 中具有空列值的其他数据帧合并? - How to merge dataframes with others that have empty column values in python? 查找列值总和具有特定结果的行 - Find rows in which sum of column values have a specific result 如何根据另一列的值替换列的NaN值? - How to substitute NaN values of a column based on the values of another column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM