在Pandas中计算每列的唯一符号

Question

I was wondering how to calculate the number of unique symbols that occur in a single column in a dataframe. 我想知道如何计算数据帧中单个列中出现的唯一符号的数量。 For example: 例如：

df = pd.DataFrame({'col1': ['a', 'bbb', 'cc', ''], 'col2': ['ddd', 'eeeee', 'ff', 'ggggggg']})

df  col1    col2
0      a    ddd
1    bbb    eeeee
2     cc    ff
3           gggggg

It should calculate that col1 contains 3 unique symbols, and col2 contains 4 unique symbols. 它应该计算col1包含3个唯一符号，col2包含4个唯一符号。

My code so far (but this might be wrong): 到目前为止我的代码（但这可能是错误的）：

unique_symbols = [0]*203
i = 0
for col in df.columns:
    observed_symbols = []
    df_temp = df[[col]]
    df_temp = df_temp.astype('str')

    #This part is where I am not so sure
    for index, row in df_temp.iterrows():
        pass

    if symbol not in observed_symbols:
        observed_symbols.append(symbol)
    unique_symbols[i] = len(observed_symbols)
    i += 1

Thanks in advance 提前致谢

Answer 1

Here is one way: 这是一种方式：

df.apply(lambda x: len(set(''.join(x.astype(str)))))

col1    3
col2    4

Answer 2

Option 1 选项1
str.join + set inside a dict comprehension str.join + set在词典理解中
For problems like this, I'd prefer falling back to python, because it's so much faster. 对于这样的问题，我宁愿退回到python，因为它的速度要快得多。

{c : len(set(''.join(df[c]))) for c in df.columns}

{'col1': 3, 'col2': 4}

Option 2 选项2
agg
If you want to stay in pandas space. 如果你想留在熊猫空间。

df.agg(lambda x: set(''.join(x)), axis=0).str.len()

Or, 要么，

df.agg(lambda x: len(set(''.join(x))), axis=0)

col1    3
col2    4
dtype: int64

Answer 3

Maybe 也许

df.sum().apply(set).str.len()
Out[673]: 
col1    3
col2    4
dtype: int64

Answer 4

One more option: 还有一个选择：

In [38]: df.applymap(lambda x: len(set(x))).sum()
Out[38]:
col1    3
col2    4
dtype: int64

在Pandas中计算每列的唯一符号

问题描述

4 个解决方案

解决方案1
5 2018-03-26 20:21:10

解决方案2
5 已采纳 2018-03-26 20:22:10

解决方案3
5 2018-03-26 20:23:20

解决方案4
1 2018-03-26 21:42:21

在Pandas中计算每列的唯一符号

问题描述

4 个解决方案

解决方案1 5 2018-03-26 20:21:10

解决方案2 5 已采纳 2018-03-26 20:22:10

解决方案3 5 2018-03-26 20:23:20

解决方案4 1 2018-03-26 21:42:21

解决方案1
5 2018-03-26 20:21:10

解决方案2
5 已采纳 2018-03-26 20:22:10

解决方案3
5 2018-03-26 20:23:20

解决方案4
1 2018-03-26 21:42:21