根据pandas中的索引值查找出现次数

Question

I have a table like follows, considering the presence of nan:考虑到nan的存在，我有一张如下表：

    A       B       C       D
0   4.0     85.0    85.0    2.0
1   34.0    89.0    89.0    7.0
2   100     99.0    99.0    10.0
3   148.0   100.0   100.0   27.0
4   nan     103.0   nan     30.0

What I want it to get the all unique numbers from the Table, for which I have used我希望它从我使用的表中获取所有唯一数字

itertools.chain(*[df[j].unique().tolist() for j in df.columns])

that would give me all the unique values across the df .这会给我整个df所有唯一值。 Now the real problem is that I want to have output something like follows:现在真正的问题是我想要输出如下内容：

id  A  B  C  D
2   0  0  0  1
4   1  0  0  0
7   0  0  0  1
10  0  0  0  1
27  0  0  0  1
30  0  0  0  1
34  1  0  0  0
...
85  0  1  1  0
89  0  1  1  0
100 1  1  1  0

Is there a way to do it?有没有办法做到这一点？

Answer 1

Use get_dummies with DataFrame.stack , get maximal values per second level, rename columns names for cast to integers and last transpose:将get_dummies与DataFrame.stack get_dummies使用，获取每秒级别的maximal ，重命名列名称以转换为整数并最后转置：

df = pd.get_dummies(df.stack()).max(level=1).rename(columns=int).T
print (df)
     A  B  C  D
2    0  0  0  1
4    1  0  0  0
7    0  0  0  1
10   0  0  0  1
27   0  0  0  1
30   0  0  0  1
34   1  0  0  0
85   0  1  1  0
89   0  1  1  0
99   0  1  1  0
100  1  1  1  0
103  0  1  0  0
148  1  0  0  0

Answer 2

Use Series.duplicated with keep=False :使用Series.duplicated和keep=False ：

s=df.stack()
new_df = (pd.concat([s, s.duplicated(keep=False)],axis=1)
            .set_index(0,append=True)[1]
            .unstack(1, fill_value=False)
            .droplevel(None)
            .astype(int))

We can also use melt + pivot_table我们也可以使用melt + pivot_table

df2 = df.melt()
new_df = (df2.assign(dup=df2['value'].duplicated(keep=False))
             .pivot_table(index='value',
                          columns='variable',
                          values='dup',
                          fill_value=False)
             .astype(int))

Answer 3

I have also implemented with following that also solve the purpose.我也实现了以下也解决了目的。

pd.get_dummies(df.melt()).dropna(how='any')\
            .groupby('value').sum()

根据pandas中的索引值查找出现次数

问题描述

3 个解决方案

解决方案1
3 2020-03-30 09:16:45

解决方案2
2 2020-03-30 09:16:40

解决方案3
0 2020-03-30 12:32:53

根据pandas中的索引值查找出现次数

问题描述

3 个解决方案

解决方案1 3 2020-03-30 09:16:45

解决方案2 2 2020-03-30 09:16:40

解决方案3 0 2020-03-30 12:32:53

解决方案1
3 2020-03-30 09:16:45

解决方案2
2 2020-03-30 09:16:40

解决方案3
0 2020-03-30 12:32:53