[英]Checking if list in Python dataframe contains specific value while ignoring NaNs
Let's assume my dataframe's second column contains lists of integers: 假设数据框的第二列包含整数列表:
df = pd.DataFrame({"col_1":[1,2,3,4,5],"col_2":[[1,2],np.nan,[3,5,9],[2],[8,5]],"col_3":np.nan})
Output: 输出:
col_1 col_2 col_3
0 1 [1, 2] NaN
1 2 NaN NaN
2 3 [3, 5, 9] NaN
3 4 [2] NaN
4 5 [8, 5] NaN
I'd like to insert a 1
in column 3 if the int in column 1 can be found in in the list of ints in column 2: 如果可以在第2列的int列表中找到第
1
列的int,我想在第3列中插入1
。
col_1 col_2 col_3
0 1 [1, 2] 1
1 2 NaN NaN
2 3 [3, 5, 9] 1
3 4 [2] NaN
4 5 [8, 5] 1
I was trying to solve it like that: 我试图像这样解决它:
for i in range(0,len(df)):
if df["col_1"][i] in df["col_2"][i]:
df["col_3"][i]=1
This gave me TypeError: argument of type 'float' is not iterable
because of the NaN
in the column 2 and I couldn't work out a way to deal with it. 这给我带来
TypeError: argument of type 'float' is not iterable
因为在第2列中存在NaN
,我无法找到一种处理它的方法。
(I've tried to solve this using a diffent solution based on .isin
but this wouldn't work because AttributeError: 'list' object has no attribute 'isin'
.) (我尝试使用基于
.isin
的不同解决方案来解决此问题,但这将不起作用,因为AttributeError: 'list' object has no attribute 'isin'
。)
I then had the idea to replace all the NaN
in col_2 with a 0
so that my inital for loop would be able to run through. 然后,我想到将col_2中的所有
NaN
替换为0
以便我的inital for循环能够通过。 There are no 0
in col_1 and will never be, so I'd be fine with that solution because this won't lead to wrong matches in col_3. col_1中没有
0
,并且永远不会为0
,因此我可以采用该解决方案,因为这不会导致col_3中的错误匹配。 To this end, 为此,
df.loc[df["col_2"].isnull(), "col_2"] = 0
is not enough because the if in
can't deal with ints: TypeError: argument of type 'int' is not iterable
. 这是不够的,因为
if in
无法处理ints: TypeError: argument of type 'int' is not iterable
。 I would need the 0
do be inserted as an element of a list, but you can't just use =[0]
instead. 我需要将
0
作为列表的元素插入,但不能只使用=[0]
。 I've tried different things based on .at
because it should be able to insert lists into cells, but I couldn't work it out. 我已经尝试过基于
.at
其他操作,因为它应该能够将列表插入单元格中,但是我无法解决。
Thank you in advance for any advice! 预先感谢您的任何建议!
You can filter out NaNs with if-else
with x['col_2'] == x['col_2']
because NaN != NaN
is False
, alsofor convert True
s values to 1
is used map
by dictionary, False
values are not in dict, so returned NaN
s: 您可以使用
x['col_2'] == x['col_2']
用if-else
过滤掉NaN,因为NaN != NaN
是False
,对于字典中使用True
值转换为1
情况,按字典map
, False
值不在dict,因此返回NaN
:
f = lambda x: x['col_1'] in x['col_2'] if x['col_2'] == x['col_2'] else np.nan
df['col_3'] = df.apply(f, 1).map({True:1})
print (df)
col_1 col_2 col_3
0 1 [1, 2] 1.0
1 2 NaN NaN
2 3 [3, 5, 9] 1.0
3 4 [2] NaN
4 5 [8, 5] 1.0
Or use DataFrame.dropna
for remove NaNs rows and after assign back new column are added removed NaN
s: 或使用
DataFrame.dropna
删除NaN行,并在分配回新列后添加删除的NaN
:
f = lambda x: x['col_1'] in x['col_2']
df['col_3'] = df.dropna(subset=['col_1', 'col_2']).apply(f, 1).map({True:1})
print (df)
col_1 col_2 col_3
0 1 [1, 2] 1.0
1 2 NaN NaN
2 3 [3, 5, 9] 1.0
3 4 [2] NaN
4 5 [8, 5] 1.0
Use 采用
s=df.dropna(subset=['col_2','col_1'])
df['new']=pd.DataFrame(s.col_2.tolist(),index=s.index).isin(df.col_1).sum(1).loc[lambda x : x!=0]
df
col_1 col_2 col_3 new
0 1 [1, 2] NaN 1.0
1 2 NaN NaN NaN
2 3 [3, 5, 9] NaN 1.0
3 4 [2] NaN NaN
4 5 [8, 5] NaN 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.