简体   繁体   English

根据pandas中的索引值查找出现次数

[英]Find the occurence based on the index values in pandas

I have a table like follows, considering the presence of nan:考虑到nan的存在,我有一张如下表:

    A       B       C       D
0   4.0     85.0    85.0    2.0
1   34.0    89.0    89.0    7.0
2   100     99.0    99.0    10.0
3   148.0   100.0   100.0   27.0
4   nan     103.0   nan     30.0

What I want it to get the all unique numbers from the Table, for which I have used我希望它从我使用的表中获取所有唯一数字

itertools.chain(*[df[j].unique().tolist() for j in df.columns])

that would give me all the unique values across the df .这会给我整个df所有唯一值。 Now the real problem is that I want to have output something like follows:现在真正的问题是我想要输出如下内容:

id  A  B  C  D
2   0  0  0  1
4   1  0  0  0
7   0  0  0  1
10  0  0  0  1
27  0  0  0  1
30  0  0  0  1
34  1  0  0  0
...
85  0  1  1  0
89  0  1  1  0
100 1  1  1  0

Is there a way to do it?有没有办法做到这一点?

Use get_dummies with DataFrame.stack , get maximal values per second level, rename columns names for cast to integers and last transpose:get_dummiesDataFrame.stack get_dummies使用,获取每秒级别的maximal ,重命名列名称以转换为整数并最后转置:

df = pd.get_dummies(df.stack()).max(level=1).rename(columns=int).T
print (df)
     A  B  C  D
2    0  0  0  1
4    1  0  0  0
7    0  0  0  1
10   0  0  0  1
27   0  0  0  1
30   0  0  0  1
34   1  0  0  0
85   0  1  1  0
89   0  1  1  0
99   0  1  1  0
100  1  1  1  0
103  0  1  0  0
148  1  0  0  0

Use Series.duplicated with keep=False :使用Series.duplicatedkeep=False

s=df.stack()
new_df = (pd.concat([s, s.duplicated(keep=False)],axis=1)
            .set_index(0,append=True)[1]
            .unstack(1, fill_value=False)
            .droplevel(None)
            .astype(int))

We can also use melt + pivot_table我们也可以使用melt + pivot_table

df2 = df.melt()
new_df = (df2.assign(dup=df2['value'].duplicated(keep=False))
             .pivot_table(index='value',
                          columns='variable',
                          values='dup',
                          fill_value=False)
             .astype(int))

I have also implemented with following that also solve the purpose.我也实现了以下也解决了目的。

pd.get_dummies(df.melt()).dropna(how='any')\
            .groupby('value').sum()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM