简体   繁体   中英

How to find the number of unique values in comma separated strings stored in an pandas data frame column?

x Unique_in_x
5,5,6,7,8,6,8 4
5,9,8,0 4
5,9,8,0 4
3,2 2
5,5,6,7,8,6,8 4

Unique_in_x is my expected column.Sometime x column might be string also.

You can use a list comprehension with a set

df['Unique_in_x'] = [len(set(x.split(','))) for x in df['x']]

Or using a split and nunique :

df['Unique_in_x'] = df['x'].str.split(',', expand=True).nunique(1)

Output:

               x  Unique_in_x
0  5,5,6,7,8,6,8            4
1        5,9,8,0            4
2        5,9,8,0            4
3            3,2            2
4  5,5,6,7,8,6,8            4

You can find the unique value of the list with np.unique() and then just use the length

import pandas as pd
import numpy as np

df['Unique_in_x'] = df['X'].apply(lambda x : len(np.unique(x.split(','))))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM