简体   繁体   中英

Pandas Dataframe drop duplicates in a column of lists?

I'm trying to remove duplicates in columns a and c .

        a      b    c
0  [1, 0]      1    ab
1  [0, 0]      2    bc
2  [1, 0]      3    ab

Resultant Output:

        a      b    c
0  [1, 0]      1    ab
1  [0, 0]      2    bc

What i have tried: With out a column being list. df.drop_duplicates(['a','c']) works.

Without c column being str. pd.DataFrame(np.unique(df), columns=df.columns) works for droping duplicate lists.

How to proceed if one the columns is a list and other string.

Method 1

Lists are not hashable in pandas but you can use tuple.

df['d'] = df['a'].apply(lambda x: tuple(x) if type(x) is list else x)

          a  b   c       d
0    [1, 0]  1  ab  (1, 0)
1    [0, 0]  2  bc  (0, 0)
2    [1, 0]  3  ab  (1, 0)

then

df = df.drop_duplicates(subset=['c', 'd'])

result:

         a  b   c       d
0    [1, 0]  1  ab  (1, 0)
1    [0, 0]  2  bc  (0, 0)

Method 2

You can convert columns containing lists to str.

df['a'] = df['a'].astype(str)
df = df.drop_duplicates(subset=['a', 'c'])

Output

    a      b    c
0  [1, 0]      1    ab
1  [0, 0]      2    bc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM