[英]Python: How to find which values in a column have NaN values in another specific column (dataframes)
Suppose we have df1
that looks like this: 假设我们的
df1
如下所示:
x1 = [{'partner': "Afghanistan", 'commodity': NaN},
{'partner': "Zambia", 'commodity': 2},
{'partner': "Germany", 'commodity': 2},
{'partner': "Afghanistan", 'commodity': NaN},
{'partner': "Canada", 'commodity': NaN},
{'partner': "Italy", 'commodity': 3},
{'partner': "Canada", 'commodity': NaN},
{'partner': "USA", 'commodity': NaN}]
df1 = pd.DataFrame(x1)
What I want to do is see the list of values in partner
that have the NaN
value in commodity
, but I don't want to have the same partner
listed twice. 我想做的是查看
commodity
中具有NaN
值的partner
中的值列表,但是我不想让同一个partner
列出两次。
So my preferred result would look like this: 因此,我的首选结果将如下所示:
commodity_nan_partners=
Afghanistan
Canada
USA
and not: 并不是:
Afghanistan
Afghanistan
Canada
Canada
USA
You can look for NaN
values using isnull
, then get unique values with unique
or set
: 您可以使用
isnull
查找NaN
值,然后使用unique
或set
获得唯一值:
>>> pd.Series(df1.loc[df1.commodity.isnull(),'partner'].unique())
0 Afghanistan
1 Canada
2 USA
dtype: object
# or
>>> pd.Series(list(set(df1.loc[df1.commodity.isnull(),'partner'])))
0 Canada
1 Afghanistan
2 USA
dtype: object
loc
+ isnull
+ drop_duplicates
loc
+ isnull
+ drop_duplicates
You can filter your series and then drop duplicates: 您可以过滤您的系列,然后删除重复项:
res = df1.loc[df1['commodity'].isnull(), 'partner'].drop_duplicates()
print(res)
0 Afghanistan
4 Canada
7 USA
Name: partner, dtype: object
Step 1 第1步
Filter out to retain valid strings only: 筛选出仅保留有效字符串:
v = df1.loc[df1.commodity.isna(), 'partner']
Or, 要么,
v = df1.partner[df1.commodity.isna()]
print(v)
0 Afghanistan
3 Afghanistan
4 Canada
6 Canada
7 USA
Name: partner, dtype: object
Step 2 第2步
Drop duplicates. 删除重复项。
If you want a collection, 如果您要收藏,
ingredients.unique()
array(['Afghanistan', 'Canada', 'USA'], dtype=object)
Or, 要么,
set(ingredients)
{'Afghanistan', 'Canada', 'USA'}
If you want a Series, 如果您想要系列,
ser = ingredients.drop_duplicates().reset_index(drop=True)
0 Afghanistan
1 Canada
2 USA
Name: partner, dtype: object
If you want a DataFrame, 如果您想要一个DataFrame,
df = ser.to_frame()
May check with dropna
, just provide a different Idea here . 可以与
dropna
,在这里提供一个不同的想法。
set(df1.partner.tolist())-set(df1.dropna().partner.tolist())
Out[94]: {'Afghanistan', 'Canada', 'USA'}
Just another alternatives: 只是另一种选择:
>>> df1[df1.isnull().any(axis=1)]['partner'].drop_duplicates()
0 Afghanistan
4 Canada
7 USA
Name: partner, dtype: object
Using loc
+ np.isnan
使用
loc
+ np.isnan
>>> df1.loc[np.isnan(df1.commodity), 'partner'].drop_duplicates()
0 Afghanistan
4 Canada
7 USA
Name: partner, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.