[英]Find the occurence based on the index values in pandas
I have a table like follows, considering the presence of nan:考虑到nan的存在,我有一张如下表:
A B C D
0 4.0 85.0 85.0 2.0
1 34.0 89.0 89.0 7.0
2 100 99.0 99.0 10.0
3 148.0 100.0 100.0 27.0
4 nan 103.0 nan 30.0
What I want it to get the all unique numbers from the Table, for which I have used我希望它从我使用的表中获取所有唯一数字
itertools.chain(*[df[j].unique().tolist() for j in df.columns])
that would give me all the unique values across the df
.这会给我整个
df
所有唯一值。 Now the real problem is that I want to have output something like follows:现在真正的问题是我想要输出如下内容:
id A B C D
2 0 0 0 1
4 1 0 0 0
7 0 0 0 1
10 0 0 0 1
27 0 0 0 1
30 0 0 0 1
34 1 0 0 0
...
85 0 1 1 0
89 0 1 1 0
100 1 1 1 0
Is there a way to do it?有没有办法做到这一点?
Use get_dummies
with DataFrame.stack
, get maximal
values per second level, rename columns names for cast to integers and last transpose:将
get_dummies
与DataFrame.stack
get_dummies
使用,获取每秒级别的maximal
,重命名列名称以转换为整数并最后转置:
df = pd.get_dummies(df.stack()).max(level=1).rename(columns=int).T
print (df)
A B C D
2 0 0 0 1
4 1 0 0 0
7 0 0 0 1
10 0 0 0 1
27 0 0 0 1
30 0 0 0 1
34 1 0 0 0
85 0 1 1 0
89 0 1 1 0
99 0 1 1 0
100 1 1 1 0
103 0 1 0 0
148 1 0 0 0
Use Series.duplicated
with keep=False
:使用
Series.duplicated
和keep=False
:
s=df.stack()
new_df = (pd.concat([s, s.duplicated(keep=False)],axis=1)
.set_index(0,append=True)[1]
.unstack(1, fill_value=False)
.droplevel(None)
.astype(int))
We can also use melt
+ pivot_table
我们也可以使用
melt
+ pivot_table
df2 = df.melt()
new_df = (df2.assign(dup=df2['value'].duplicated(keep=False))
.pivot_table(index='value',
columns='variable',
values='dup',
fill_value=False)
.astype(int))
I have also implemented with following that also solve the purpose.我也实现了以下也解决了目的。
pd.get_dummies(df.melt()).dropna(how='any')\
.groupby('value').sum()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.