简体   繁体   English

在忽略NaN的同时检查Python数据框中的列表是否包含特定值

[英]Checking if list in Python dataframe contains specific value while ignoring NaNs

Let's assume my dataframe's second column contains lists of integers: 假设数据框的第二列包含整数列表:

df = pd.DataFrame({"col_1":[1,2,3,4,5],"col_2":[[1,2],np.nan,[3,5,9],[2],[8,5]],"col_3":np.nan})

Output: 输出:

   col_1      col_2  col_3
0      1     [1, 2]    NaN
1      2        NaN    NaN
2      3  [3, 5, 9]    NaN
3      4        [2]    NaN
4      5     [8, 5]    NaN

I'd like to insert a 1 in column 3 if the int in column 1 can be found in in the list of ints in column 2: 如果可以在第2列的int列表中找到第1列的int,我想在第3列中插入1

   col_1      col_2  col_3
0      1     [1, 2]    1
1      2        NaN    NaN
2      3  [3, 5, 9]    1
3      4        [2]    NaN
4      5     [8, 5]    1

I was trying to solve it like that: 我试图像这样解决它:

for i in range(0,len(df)):
    if df["col_1"][i] in df["col_2"][i]:
        df["col_3"][i]=1

This gave me TypeError: argument of type 'float' is not iterable because of the NaN in the column 2 and I couldn't work out a way to deal with it. 这给我带来TypeError: argument of type 'float' is not iterable因为在第2列中存在NaN ,我无法找到一种处理它的方法。

(I've tried to solve this using a diffent solution based on .isin but this wouldn't work because AttributeError: 'list' object has no attribute 'isin' .) (我尝试使用基于.isin的不同解决方案来解决此问题,但这将不起作用,因为AttributeError: 'list' object has no attribute 'isin' 。)

I then had the idea to replace all the NaN in col_2 with a 0 so that my inital for loop would be able to run through. 然后,我想到将col_2中的所有NaN替换为0以便我的inital for循环能够通过。 There are no 0 in col_1 and will never be, so I'd be fine with that solution because this won't lead to wrong matches in col_3. col_1中没有0 ,并且永远不会为0 ,因此我可以采用该解决方案,因为这不会导致col_3中的错误匹配。 To this end, 为此,

df.loc[df["col_2"].isnull(), "col_2"] = 0

is not enough because the if in can't deal with ints: TypeError: argument of type 'int' is not iterable . 这是不够的,因为if in无法处理ints: TypeError: argument of type 'int' is not iterable I would need the 0 do be inserted as an element of a list, but you can't just use =[0] instead. 我需要将0作为列表的元素插入,但不能只使用=[0] I've tried different things based on .at because it should be able to insert lists into cells, but I couldn't work it out. 我已经尝试过基于.at其他操作,因为它应该能够将列表插入单元格中,但是我无法解决。

Thank you in advance for any advice! 预先感谢您的任何建议!

You can filter out NaNs with if-else with x['col_2'] == x['col_2'] because NaN != NaN is False , alsofor convert True s values to 1 is used map by dictionary, False values are not in dict, so returned NaN s: 您可以使用x['col_2'] == x['col_2']if-else过滤掉NaN,因为NaN != NaNFalse ,对于字典中使用True值转换为1情况,按字典mapFalse值不在dict,因此返回NaN

f = lambda x: x['col_1'] in x['col_2'] if x['col_2'] == x['col_2'] else np.nan
df['col_3'] = df.apply(f, 1).map({True:1})
print (df)
   col_1      col_2  col_3
0      1     [1, 2]    1.0
1      2        NaN    NaN
2      3  [3, 5, 9]    1.0
3      4        [2]    NaN
4      5     [8, 5]    1.0

Or use DataFrame.dropna for remove NaNs rows and after assign back new column are added removed NaN s: 或使用DataFrame.dropna删除NaN行,并在分配回新列后添加删除的NaN

f = lambda x: x['col_1'] in x['col_2']
df['col_3'] = df.dropna(subset=['col_1', 'col_2']).apply(f, 1).map({True:1})
print (df)
   col_1      col_2  col_3
0      1     [1, 2]    1.0
1      2        NaN    NaN
2      3  [3, 5, 9]    1.0
3      4        [2]    NaN
4      5     [8, 5]    1.0

Use 采用

s=df.dropna(subset=['col_2','col_1'])
df['new']=pd.DataFrame(s.col_2.tolist(),index=s.index).isin(df.col_1).sum(1).loc[lambda x : x!=0]
df
   col_1      col_2  col_3  new
0      1     [1, 2]    NaN  1.0
1      2        NaN    NaN  NaN
2      3  [3, 5, 9]    NaN  1.0
3      4        [2]    NaN  NaN
4      5     [8, 5]    NaN  1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM