在忽略NaN的同时检查Python数据框中的列表是否包含特定值

Question

Let's assume my dataframe's second column contains lists of integers: 假设数据框的第二列包含整数列表：

df = pd.DataFrame({"col_1":[1,2,3,4,5],"col_2":[[1,2],np.nan,[3,5,9],[2],[8,5]],"col_3":np.nan})

Output: 输出：

   col_1      col_2  col_3
0      1     [1, 2]    NaN
1      2        NaN    NaN
2      3  [3, 5, 9]    NaN
3      4        [2]    NaN
4      5     [8, 5]    NaN

I'd like to insert a 1 in column 3 if the int in column 1 can be found in in the list of ints in column 2: 如果可以在第2列的int列表中找到第1列的int，我想在第3列中插入1 。

   col_1      col_2  col_3
0      1     [1, 2]    1
1      2        NaN    NaN
2      3  [3, 5, 9]    1
3      4        [2]    NaN
4      5     [8, 5]    1

I was trying to solve it like that: 我试图像这样解决它：

for i in range(0,len(df)):
    if df["col_1"][i] in df["col_2"][i]:
        df["col_3"][i]=1

This gave me TypeError: argument of type 'float' is not iterable because of the NaN in the column 2 and I couldn't work out a way to deal with it. 这给我带来TypeError: argument of type 'float' is not iterable因为在第2列中存在NaN ，我无法找到一种处理它的方法。

(I've tried to solve this using a diffent solution based on .isin but this wouldn't work because AttributeError: 'list' object has no attribute 'isin' .) （我尝试使用基于.isin的不同解决方案来解决此问题，但这将不起作用，因为AttributeError: 'list' object has no attribute 'isin' 。）

I then had the idea to replace all the NaN in col_2 with a 0 so that my inital for loop would be able to run through. 然后，我想到将col_2中的所有NaN替换为0以便我的inital for循环能够通过。 There are no 0 in col_1 and will never be, so I'd be fine with that solution because this won't lead to wrong matches in col_3. col_1中没有0 ，并且永远不会为0 ，因此我可以采用该解决方案，因为这不会导致col_3中的错误匹配。 To this end, 为此，

df.loc[df["col_2"].isnull(), "col_2"] = 0

is not enough because the if in can't deal with ints: TypeError: argument of type 'int' is not iterable . 这是不够的，因为if in无法处理ints： TypeError: argument of type 'int' is not iterable 。 I would need the 0 do be inserted as an element of a list, but you can't just use =[0] instead. 我需要将0作为列表的元素插入，但不能只使用=[0] 。 I've tried different things based on .at because it should be able to insert lists into cells, but I couldn't work it out. 我已经尝试过基于.at其他操作，因为它应该能够将列表插入单元格中，但是我无法解决。

Thank you in advance for any advice! 预先感谢您的任何建议！

Answer 1

You can filter out NaNs with if-else with x['col_2'] == x['col_2'] because NaN != NaN is False , alsofor convert True s values to 1 is used map by dictionary, False values are not in dict, so returned NaN s: 您可以使用x['col_2'] == x['col_2']用if-else过滤掉NaN，因为NaN != NaN是False ，对于字典中使用True值转换为1情况，按字典map ， False值不在dict，因此返回NaN ：

f = lambda x: x['col_1'] in x['col_2'] if x['col_2'] == x['col_2'] else np.nan
df['col_3'] = df.apply(f, 1).map({True:1})
print (df)
   col_1      col_2  col_3
0      1     [1, 2]    1.0
1      2        NaN    NaN
2      3  [3, 5, 9]    1.0
3      4        [2]    NaN
4      5     [8, 5]    1.0

Or use DataFrame.dropna for remove NaNs rows and after assign back new column are added removed NaN s: 或使用DataFrame.dropna删除NaN行，并在分配回新列后添加删除的NaN ：

f = lambda x: x['col_1'] in x['col_2']
df['col_3'] = df.dropna(subset=['col_1', 'col_2']).apply(f, 1).map({True:1})
print (df)
   col_1      col_2  col_3
0      1     [1, 2]    1.0
1      2        NaN    NaN
2      3  [3, 5, 9]    1.0
3      4        [2]    NaN
4      5     [8, 5]    1.0

Answer 2

Use 采用

s=df.dropna(subset=['col_2','col_1'])
df['new']=pd.DataFrame(s.col_2.tolist(),index=s.index).isin(df.col_1).sum(1).loc[lambda x : x!=0]
df
   col_1      col_2  col_3  new
0      1     [1, 2]    NaN  1.0
1      2        NaN    NaN  NaN
2      3  [3, 5, 9]    NaN  1.0
3      4        [2]    NaN  NaN
4      5     [8, 5]    NaN  1.0

在忽略NaN的同时检查Python数据框中的列表是否包含特定值

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-09-14 14:57:05

解决方案2
1 2019-09-14 15:03:58

在忽略NaN的同时检查Python数据框中的列表是否包含特定值

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-09-14 14:57:05

解决方案2 1 2019-09-14 15:03:58

解决方案1
3 已采纳 2019-09-14 14:57:05

解决方案2
1 2019-09-14 15:03:58