I'm trying to remove duplicates in columns a
and c
.
a b c
0 [1, 0] 1 ab
1 [0, 0] 2 bc
2 [1, 0] 3 ab
Resultant Output:
a b c
0 [1, 0] 1 ab
1 [0, 0] 2 bc
What i have tried: With out a
column being list. df.drop_duplicates(['a','c'])
works.
Without c
column being str. pd.DataFrame(np.unique(df), columns=df.columns)
works for droping duplicate lists.
How to proceed if one the columns is a list and other string.
Method 1
Lists are not hashable in pandas but you can use tuple.
df['d'] = df['a'].apply(lambda x: tuple(x) if type(x) is list else x)
a b c d
0 [1, 0] 1 ab (1, 0)
1 [0, 0] 2 bc (0, 0)
2 [1, 0] 3 ab (1, 0)
then
df = df.drop_duplicates(subset=['c', 'd'])
result:
a b c d
0 [1, 0] 1 ab (1, 0)
1 [0, 0] 2 bc (0, 0)
Method 2
You can convert columns containing lists to str.
df['a'] = df['a'].astype(str)
df = df.drop_duplicates(subset=['a', 'c'])
Output
a b c
0 [1, 0] 1 ab
1 [0, 0] 2 bc
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.