I have a dataframe with a column of lists:
full_list_to_check
0 NaN
1 NaN
2 [1, 2, 3, 4, 5]
3 [6, 6]
4 [11, 11]
I need to create a new column where it shows a distinct list for each row if duplicates exist in the list, otherwise just the same list.
full_list_to_check new_col
0 NaN NaN
1 NaN NaN
2 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
3 [6, 6] [6]
4 [11, 11] [11]
I have tried this:
df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)))
But I get this error:
TypeError: 'float' object is not iterable
You must check Nan
:
df['full_list_to_check'].apply(lambda x: list(set(x)) if not np.any(pd.isna(x)) else np.nan)
Update:
df['full_list_to_check'].apply(lambda x: list(set(x)) if x is not np.nan else np.nan)
0 NaN
1 NaN
2 [1, 2, 3, 4, 5]
3 [6]
4 [11]
You can try this:
df['new_col'] = df.loc[~df['full_list_to_check'].isna(), 'full_list_to_check'].apply(lambda x: list(set(x)))
full_list_to_check new_col
0 NaN NaN
1 NaN NaN
2 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
3 [6, 6] [6]
4 [11, 11] [11]
You could use:
df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)) if isinstance(x,list) else x)
The other answers only works if there are no other values then lists or NaN in your data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.